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HE,  YUN.  Multiscale  Signal  Processing  and  Shape  Analysis  for  an  Inverse  SAR 
Imaging  System.  (Under  the  direction  of  Prof.  Hamid  Krim.) 

The  great  challenge  in  signal  processing  is  to  devise  computationally  efficient  and  statisti¬ 
cally  optimal  algorithms  for  estimating  signals  from  noisy  background  and  understanding  their 
contents.  This  thesis  treats  the  problem  of  multiscale  signal  processing  and  shape  analysis  for 
an  Inverse  Synthetic  Aperture  Radar  (ISAR)  imaging  system.  To  address  some  of  the  limita¬ 
tions  of  conventional  techniques  in  radar  image  processing,  an  information  theoretic  approach 
for  target  motion  estimation  is  first  proposed.  A  wavelet  based  multiscale  method  for  shape 
enhancement  is  subsequently  derived  and  followed  by  a  regression  technique  for  shape  recog¬ 
nition. 

Building  on  entropy-based  divergence  measures  which  have  shown  promising  results  in 
many  areas  of  engineering  and  image  processing,  we  introduce  in  this  thesis  a  new  generalized 
divergence  measure,  namely  the  Jensen-Renyi  divergence.  Upon  establishing  its  properties  such 
as  convexity  and  its  upper  bound  etc.,  we  apply  it  to  image  registration  for  ISAR  focusing  as 
well  as  related  problems  in  data  fusion. 

Attempting  to  extend  current  approaches  to  signal  estimation  in  a  wavelet  framework, 
which  have  generally  relied  on  the  assumption  of  normally  distributed  perturbations,  we  pro¬ 
pose  a  novel  non-linear  filtering  technique,  as  a  pre-processing  step  for  the  shapes  obtained 
from  an  ISAR  imaging  system.  The  key  idea  is  to  project  a  noisy  shape  onto  a  wavelet  domain 
and  to  suppress  wavelet  coefficients  by  a  mask  derived  from  curvature  extrema  in  its  scale  space 
representation.  For  a  piecewise  smooth  signal,  it  can  be  shown  that  filtering  by  this  curvature 
mask  is  equivalent  to  preserving  the  signal  pointwise  Holder  exponents  at  the  singular  points, 
and  to  lifting  its  smoothness  at  all  the  remaining  points. 

To  identify  a  shape  independently  of  its  registration  information,  we  propose  matching 
two  configurations  by  regression,  using  notations  of  general  shape  spaces  and  procrustean  dis¬ 
tances.  In  particular,  we  study  the  generalized  matching  by  estimating  mean  shapes  in  two 
dimensions.  Simulation  results  show  that  matching  by  way  of  a  mean  shape  is  more  robust 
than  matching  target  shapes  directly. 
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Chapter 

D _ 

■  Introduction 


INVERSE  Synthetic  Aperture  Radar  (ISAR)  is  an  imaging  technique  that  achieves  a 
high  resolution  by  exploiting  the  relative  motion  between  a  stationary  radar  and 
a  moving  target.  This  is  accomplished  by  coherently  processing  the  returned  radar 
signals  so  as  to  synthesize  the  effect  of  a  larger  aperture  array  laid  out  along  the  target's 
path  of  motion.  One  important  application  of  ISAR  is  as  a  front-end  system  for  the 
purpose  of  target  recognition.  The  fundamental  goal  is  to  detect  and  recognize  objects 
of  interest  in  a  noisy  environment. 

A  typical  ISAR  imaging  system  consists  of  image  acquisition,  data  fusion,  target  shape 
extraction  and  enhancement,  and  finally  shape  recognition.  The  main  theme  of  this 
thesis  is  focused  on  information  theoretic  imaging  and  shape  analysis.  An  information 
theoretic  approach  for  ISAR  image  focusing  and  fusion  is  first  proposed  to  serve  as 
a  robust  stage  for  image  acquisition.  A  wavelet  based  multiscale  method  for  shape 
enhancement  is  subsequently  derived  to  estimate  a  shape  in  a  noisy  background  which 
is  followed  by  a  regression  technique  for  shape  recognition. 
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1.1  Problem  Motivation  and  Formulation 

Motivated  by  the  current  limitations  of  conventional  techniques  in  radar  image  focus¬ 
ing,  target  shape  estimation  and  recognition,  this  thesis  addresses  the  following  issues. 

•  Information  theoretic  approach  for  ISAR  image  focusing 

The  ISAR  imagery  is  induced  by  the  target  rotation,  which  in  turn  causes  time  varying 
spectra  of  the  reflected  signals  and  blurs  the  resulting  image.  When  a  target  exhibits 
complex  motion,  such  as  rotation  and  maneuvering,  a  standard  motion  compensation 
is  not  adequate  to  generate  an  acceptable  image  for  viewing  and  analysis.  In  this  thesis, 
we  tackle  this  problem  by  an  information  theoretic  approach. 

In  the  work  of  Woods  [1]  and  Viola  [2],  mutual  information,  a  basic  concept  from  in¬ 
formation  theory,  is  introduced  as  a  measure  of  evaluating  the  similarity  between  im¬ 
ages.  When  the  two  images  are  properly  matched,  corresponding  areas  overlap,  and 
the  resulting  joint  histogram  contains  high  values  for  the  pixel  combinations  of  the 
corresponding  regions.  When  the  images  are  mis-registered,  non-matched  areas  also 
overlap  and  will  contribute  to  additional  pixel  combinations  in  the  joint  histogram.  In 
case  of  misregistration,  the  joint  histogram  has  less  significant  peaks  and  is  more  dis¬ 
persed  than  that  of  the  correct  alignment  of  images.  The  registration  criterion  is  hence 
to  find  a  transformation  such  that  the  mutual  information  of  the  corresponding  pixel 
pair  intensity  values  in  the  matching  images  is  maximized.  This  approach  is  accepted 
widely  [3]  as  one  of  the  most  accurate  and  robust  registration  measures.  Following  the 
same  argument.  Hero  [4]  et  al.  extends  the  concept  to  apply  Renyi  entropy  to  measure 
the  joint  histogram  as  a  dissimilarity  metric  between  images. 

Inspired  by  this  previous  work,  and  looking  to  address  their  limitation  in  often  difficult 
imagery,  we  introduce  in  this  paper  a  novel  generalized  information  theoretic  measure, 
namely  the  Jensen-Renyi  divergence  and  defined  in  terms  of  Renyi  entropy  [5].  Jensen- 
Renyi  divergence  is  defined  as  a  similarity  measurement  among  any  finite  number  of 
weighted  probability  distributions.  Shaimon  mutual  information  is  a  special  case  of 
the  Jensen-Renyi  divergence.  This  generalization  provides  us  an  ability  to  control  the 
measurement  sensitivity  of  the  joint  histogram. 
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The  objective  of  ISAR  image  registration  is  to  estimate  the  target  motion  during  the 
imaging  time.  Let  7(^,0, 7)  be  a  Euclidean  transformation  with  translational  parameter 
^  =  (4, 4)/  rotational  parameter  9  and  scaling  parameter  7.  Given  two  ISAR  image 
frames  /i  and  /2,  the  estimates  of  target  motion  parameters  (/*,  0*,  7*)  are  given  by 

(r,r,7*)  =  aigmax  Djj^  7(1, e,j)f2)  (1-1) 

(1,0, V 

where  Djj^  (•)  is  an  induced  similarity  measure  based  on  the  proposed  Jensen-Renyi 
divergence  of  order  a  and  weight  oj,  which  is  maximal  when  fi  matches  T(i,e,-i)f2-  As  the 
radar  tracks  the  target,  the  reflected  signal  is  continuously  recorded  during  the  imaging 
time.  By  registering  a  sequence  of  consecutive  image  frames,  the  target  motion 

during  the  imaging  time  can  be  estimated  by  interpolating  {(/*,  di,  Based  on  the 

estimated  trajectory  of  the  target,  translational  motion  compensation  (TMC),  and  rota¬ 
tional  motion  compensation  (RMC)  [6]  can  be  used  to  generate  a  focused  image  of  the 
target. 

•  Multiscale  Signal  Enhancement:  Beyond  the  Normality  and  Independence  As¬ 
sumption 

To  fulfill  the  goal  of  an  ISAR  imaging  system  of  recognizing  objects  of  interest  in  a  noisy 
environment,  shape  enhancement  is  required  and  constitutes  a  crucial  step.  Donoho 
and  Johnstone  [7]  first  showed  that  effective  noise  suppression  may  be  achieved  by 
wavelet  shrinkage,  in  comparison  to  traditional  linear  methods.  Given  the  noisy  wavelet 
coefficients,  i.e.  the  true  wavelet  coefficients  plus  a  noise  term,  and  assuming  that  one 
has  knowledge  of  the  true  wavelet  coefficients,  an  ideal  filter  sets  a  noisy  coefficient  to 
zero  if  the  noise  variance  is  greater  than  the  square  of  the  true  wavelet  coefficient; 
otherwise  the  noisy  coefficient  is  kept.  In  this  way,  the  mean  square  error  of  this  ideal 
estimator  is  the  minimum  of  (7  and  the  square  of  the  coefficient.  Under  the  assumption 
of  i.i.d.  normal  noise,  it  can  shown  that  a  soft  thresholding  estimator  achieves  a  risk  at 
most  0(log  M)  times  the  risk  of  this  ideal  estimator,  where  M  is  the  length  of  the  obser¬ 
vation. 

To  choose  an  appropriate  threshold,  Donoho  and  Johnstone  [7]  have  taken  a  minimax 
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approach  to  characterize  the  signal,  and  they  proved  that  the  estimation  risk  is  close  to 
the  minimax  risk  by  setting  a  threshold 

T  =  (Ta/2  logg  M. 

Krim  and  Pesquet  [8]  have  given  an  alternative  derivation  for  this  threshold,  using 
Rissanen's  Minimum  Description  Length  (MDL)  criterion  [9]  and  the  assumption  of 
normally  distributed  noise.  The  threshold  T  increase  with  M  is  due  to  the  tail  of  the 
Gaussian  distribution,  which  tends  to  generate  larger  noise  coefficients  when  sample 
size  increases.  This  threshold  is  not  optimal,  and  in  general  a  lower  threshold  reduces 
the  risk.  To  refine  the  threshold,  a  SureShrink  [10]  procedure  is  proposed.  Sureshrink 
calculates  thresholds  by  the  principle  of  minimizing  the  Stein  unbiased  estimate  of  risk 
for  threshold  estimates.  SureShrink  is  also  based  on  the  assumption  of  i.i.d.  normal 
noise,  which  does  not  hold  for  ISAR  applications.  For  non-Gaussian  type  of  noise, 
Neumaim  [11],  Averkamp  and  Houdre  [12]  studied  the  choice  of  thresholds  by  having 
recourse  to  asymptotics.  Wavelet  thresholding  theory  is,  however,  based  on  the  as¬ 
sumption  that  we  know  the  statistics  of  the  noise  to  determine  an  adequate  threshold. 
This  makes  the  algorithm  less  flexible  and  less  adaptive  to  different  scenarios  which  can 
result  in  an  even  worse  reconstruction.  Compensation  for  the  lack  of  a  prior  knowledge 
of  the  noise  statistics  may  be  handled  by  adopting  the  minimax  principle  [13]  upon  de¬ 
riving  the  worst  case  noise  distribution. 

Points  of  sharp  variations  are  often  among  the  most  important  features  for  analyzing 
properties  of  transient  piecewise  smooth  signals.  To  characterize  the  singular  struc¬ 
tures,  Holder  exponents  [14]  provide  a  pointwise  measure  of  a  function  over  a  time 
interval.  Due  to  the  pioneering  work  by  Jaffard  [15]  and  Meyer  [16],  it  can  be  shown 
that  a  local  signal  singularity  of  a  function  is  characterized  by  the  decay  of  its  wavelet 
transform  amplitude  across  scales. 

In  this  thesis,  a  novel  nonlinear  filtering  technique  is  proposed.  We  assume  that  a  prior 
knowledge  about  pointwise  smoothness  measure  of  the  signal  is  known  or  can  be  ex¬ 
tracted.  However,  this  smoothness  property  of  the  signal  is  corrupted  by  additive  noise, 
which  in  general  has  a  uniform  Holder  exponent  less  than  1.  We  view  the  denoising 
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problem  as  one  of  carefully  controlling  the  Holder  exponents  of  measured  data  with  a 
goal  of  extracting  the  signal  portion  with  some  smoothness  fidelity  to  the  original.  Let 
/  G  L^(]R).  We  consider  an  additive  noise  model.  The  measured  data  are 

Z{t)  =  f{t)  +  Nit),  (1.2) 

where  the  noise  is  modeled  by  the  realization  of  a  zero  mean  random  process  N.  De¬ 
note  af{-)  and  Q;Ar(-)  characterize  the  pomtwise  Holder  exponent  of  /(•)  and  N{-)  re¬ 
spectively.  A  ideal  operator  T  satisfies  the  following  two  conditions: 

f  =  TZ  admits  Q;/(-)  as  its  pomtwise  Holder  exponent 
-^V{t)  =  Z{t)  —  f{t)  admits  Q;Ar(-)  as  its  pomtwise  Holder  exponent 
This  non-linear  filter  is  optimal  in  the  sense  of  recovering  the  smoothness  of  the  true 
underlying  signal. 

•  Information  theoretic  approach  for  multiscale  image  fusion 

With  the  development  of  new  imaging  sensors  arises  the  need  for  image  processing 
techniques  that  can  effectively  fuse  images  from  different  sensors  into  a  single  coherent 
composition  for  interpretation.  In  order  to  make  use  of  inherent  redundancy  and  ex¬ 
tended  coverage  of  multiple  sensors,  we  propose  a  multiscale  approach  for  pixel  level 
image  fusion.  The  ultimate  goal  is  to  reduce  human/ machine  error  in  detection  and 
recognition  of  objects. 

Over  the  past  two  decades,  a  wide  variety  of  pixel-level  image  fusion  algorithms  has 
been  developed.  These  techniques  may  be  classified  into  linear  superposition,  logical 
filter  [17],  mathematical  morphology  [18],  image  algebra  [19]  [20],  artificial  neural  net¬ 
work  [21],  and  simulated  armealing  [22]  methods.  Each  of  these  algorithms  focuses  on 
the  fact  that  the  fused  image  reveals  new  information  concerning  features  that  can  not 
be  perceived  in  individual  sensor  images.  However,  some  useful  information  has  been 
discarded  since  each  fusion  scheme  tends  to  emphasize  different  attributes  of  the  im¬ 
age.  Luo  [23]  provides  a  detailed  review  of  these  techniques. 

Inspired  by  the  fact  that  the  human  visual  system  processes  and  analyzes  image  in¬ 
formation  at  different  scales,  researchers  recently  proposed  a  multiscale  based  fusion 
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method  which  is  widely  accepted  [24]  as  one  of  the  most  effective  techniques  for  image 
fusion.  Wavelet  theory  has  played  a  particularly  important  role  in  multiscale  analysis. 
A  number  of  papers  [25]  [26]  [27]  have  addressed  fusion  algorithms  based  on  the  or¬ 
thogonal  wavelet  transform.  A  major  drawback  in  the  recent  pursuit  of  wavelet-based 
fusion  algorithms  is  due  to  a  lack  of  a  good  fusion  scheme.  Most  fusion  rules  so  far 
proposed  are  in  essence  more  or  less  similar  to  "choose  max"  scheme  proposed  by  Burt 
[28],  which  introduces  a  significant  amount  of  high  frequency  noise  due  to  the  sudden 
switch  of  the  fused  wavelet  coefficient  to  that  which  is  maximum  of  the  source.  This 
high  frequency  noise  is  particularly  undesirable  to  visual  perception. 

In  this  thesis,  we  apply  a  biorthogonal  wavelet  transform  to  the  pixel  level  image  fu¬ 
sion.  It  is  possible  to  construct  smooth  biorthogonal  wavelets  of  compact  support 
which  are  either  symmetric  or  antisymmetric.  At  the  exception  of  a  Haar  wavelet,  it 
has  been  shown  [29]  that  symmetric  orthogonal  wavelets  are  impossible  to  construct. 
Symmetric  or  antisymmetric  wavelets  are  synthesized  with  perfect  reconstruction  fil¬ 
ters  having  a  linear  phase.  This  is  a  desirable  property  for  image  fusion  applications. 
Unlike  the  "choose  max"  type  of  selection  rules,  we  propose  an  information  theoretic 
fusion  scheme.  For  each  pixel  in  a  source  image,  a  vector  consisting  of  wavelet  coef¬ 
ficients  at  that  pixel  position  across  scales  is  formed  to  indicate  the  "activity"  of  that 
pixel.  We  denote  these  indicator  vectors  of  all  the  pixels  in  a  source  image  as  its  activ¬ 
ity  map.  To  make  a  reasonable  comparison  among  activity  indicator  vectors,  we  apply 
our  newly  proposed  divergence  measure,  Jensen-Renyi  divergence,  which  is  defined  in 
terms  of  Renyi  entropy. 

Let  fi,  f 2,  ■■■,  fn  ■  ^  E  be  digital  images  of  the  same  scene  taken  from  different  sen¬ 

sors.  Denote 

Wfi  =  {dKj,n),d-(j,n),4(j,n),aj(L,n)}o<j<L,nez2,  i  =  1,2,  ...n 

as  a  biorthogonal  wavelet  image  representation  of  fi.  Our  fusion  scheme  cab  be  formu¬ 
lated  as  the  following  optimization  problem: 

{n  L  L 

i=i  j=i  2jneDo  i=i  2jneDi 
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where  Dq  is  the  set  of  pixels  whose  activity  patterns  are  similar  in  all  the  source  images, 
while  Di  is  the  set  of  pixels  whose  activity  patterns  are  different.  T  is  the  set  of  all  the 
images  /  whose  wavelet  transform  satisfies 


mm{W fi{j,n))  <  Wf{j,n)  <  max(lT/i(j,  n)), 

for  any  0  <  j  <  L  and  n  G  This  constraint  makes  sure  that  the  solution  stays  in  the 
closure  of  T,  i.e.,  no  image  outside  the  scenario  we  are  contemplating. 

•  Shape  recognition 

The  geometrical  description  of  an  object  can  be  decomposed  into  registration  and  shape 
information.  For  example,  an  object's  location,  rotation  and  size  could  be  the  registra¬ 
tion  information  and  the  geometrical  information  that  remains  is  the  shape  of  the  object. 
An  object's  shape  is  invariant  under  registration  transformations  and  two  objects  have 
the  same  shape  if  they  can  be  registered  to  match  exactly. 

The  pioneers  of  this  topic  of  general  shape  and  registration  analysis  are  Kendall  [30] 
and  Bookstein  [31].  Some  reference  and  reviews  include  Goodall  [32],  Kent  [33],  Dry- 
den  and  Mardia  [34]. 

The  difficulty  of  target  recognition  in  ISAR  imagery  is  to  identify  a  target  shape  in  the 
presence  of  interference  regardless  of  its  position,  scale  and  orientation.  In  an  effort  to 
overcome  this  difficulty,  we  describe  matching  of  two  configurations  in  Euclidean  and 
affine  shape  spaces  using  regression  techniques,  and  we  further  address  robustness  is¬ 
sues  and  matching  by  estimation. 


1.2  Thesis  Organization  and  Main  Contributions 


This  section  summarizes  the  organization  of  the  remaining  thesis,  along  with  a  brief 
description  of  the  main  contributions. 
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Chapter  2.  Inverse  Synthetic  Aperture  Radar  Imaging  System 

Inverse  Synthetic  Aperture  Radar  System  transmits  electro-magnetic  waves  to  a  target 
and  coherently  integrates  the  returned  signals  to  synthesize  the  effect  of  a  larger  aper¬ 
ture  array.  The  spatial  distribution  of  the  reflectivity  density  of  a  target,  referred  to  as 
the  image  of  the  target,  is  usually  mapped  onto  a  range-azimuth  plain.  In  this  chapter, 
we  briefly  introduce  the  range  and  azimuth  processing,  and  necessary  procedures  to 
construct  an  ISAR  image  from  reflected  and  measured  signals. 

Chapter  3.  Inverse  SAR  Image  Registration:  An  Information  Theoretic  Approach 

In  this  chapter,  a  new  generalized  divergence  measure,  Jensen-Renyi  divergence,  is  pro¬ 
posed.  Some  properties  such  as  convexity  and  its  upper  bound  are  derived.  Based  on 
the  Jensen-Renyi  divergence,  we  propose  a  new  approach  to  the  problem  of  ISAR  image 
registration.  Our  approach  applies  Jensen-Renyi  divergence  to  measure  the  statistical 
dependence  between  consecutive  ISAR  image  frames,  which  would  be  maximal  if  the 
images  are  geometrically  aligned.  Simulation  results  demonstrate  that  the  proposed 
method  is  efficient  and  effective  in  tracking  the  trajectory  of  a  target.  The  major  results 
of  this  chapter  have  been  published  in  [35],  [36]  and  [37]. 

Chapter  4.  Introduction  to  Multiscale  Analysis 

In  this  chapter,  we  briefly  review  the  concept  of  multiscale  analysis.  We  study  the  prop¬ 
erties  of  an  operator  which  approximates  a  signal  at  a  given  resolution.  We  show  that 
the  difference  of  a  signal  at  different  resolutions  can  be  extracted  by  decomposing  the 
signal  on  a  wavelet  orthonormal  basis.  The  development  of  orthonormal  wavelet  bases 
has  opened  a  new  bridge  between  approximation  theory  and  signal  processing.  In  the 
last  section  of  this  chapter,  we  discuss  a  hard/ soft  threshold  estimator  and  SureShrink 
in  particular  detail,  and  illustrate  its  adaptivity  to  unknown  smoothness  via  wavelet 
shrinkage. 

Chapter  5.  Multiscale  Signal  Enhancement:  Beyond  the  Normality  and  Indepen¬ 
dence  Assumption 

Most  existing  approaches  to  denoising  or  signal  enhancement  in  a  wavelet-based  frame¬ 
work  have  generally  relied  on  the  assumption  of  normally  distributed  and  independent 
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perturbations.  In  practice,  this  assumption  is  often  violated  and  sometimes,  even  the 
prior  information  of  a  probability  distribution  of  the  noise  process  is  unavailable.  To 
relax  this  assumption,  we  propose  a  novel  non-linear  filtering  technique  in  this  chapter. 
The  key  idea  is  to  project  a  noisy  signal  onto  a  wavelet  domain  and  to  suppress  wavelet 
coefficients  by  a  mask  derived  from  its  curvature  extrema  in  a  scale  space  representa¬ 
tion.  For  a  piecewise  smooth  signal,  it  can  be  shown  that  filtering  by  this  curvature 
mask  is  equivalent  to  preserving  the  signal  pointwise  Holder  exponents  at  the  singular 
points  and  lifting  its  smoothness  at  all  the  remaining  points.  The  major  results  of  this 
chapter  have  been  published  in  [38]  and  [39]. 

Chapter  6.  A  Multiscale  Approach  for  Pixel  Level  Image  Fusion 

Pixel  level  image  fusion  refers  to  the  processing  and  synergistic  combination  of  infor¬ 
mation  gathered  by  various  imaging  sources  to  provide  a  better  understanding  of  a 
scene.  We  formulate  the  image  fusion  as  an  optimization  problem  and  propose  an  in¬ 
formation  theoretic  approach  in  a  multiscale  framework  to  solve  it.  A  biorthogonal 
wavelet  transform  of  each  source  image  is  first  calculated,  and  the  new  fusion  algo¬ 
rithm  applies  Jensen-Rmyi  divergence  to  construct  a  composite  of  wavelet  coefficients 
according  to  the  measurement  of  the  information  patterns  inherent  in  source  images. 
Experimental  results  on  fusion  of  multi-sensor  navigation  images,  multi-modality  med¬ 
ical  images,  multi-spectral  remote  sensing  images,  and  multi-focus  optical  images  are 
presented  to  illustrate  the  proposed  fusion  scheme.  The  major  results  of  this  chapter 
have  been  published  in  [40]. 

Chapter  7.  Shape  Recognition 

The  geometrical  description  of  an  object  can  be  decomposed  into  registration  and  shape 
information.  The  goal  of  shape  recognition  is  to  identify  a  shape  regardless  of  its  reg¬ 
istration  information.  In  this  chapter,  we  describe  the  matching  of  two  configurations 
using  a  regression  technique,  making  coimections  with  general  shape  spaces  and  pro- 
crustean  distances.  In  particular,  we  study  the  generalized  matching  by  estimation  in 
Euclidean  and  affine  shape  spaces.  Simulation  results  show  that  matching  by  way  of  a 
mean  shape  is  more  robust  than  matching  target  shapes  directly. 
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Chapter  8.  Conclusions  and  Discussions 

In  this  chapter,  we  briefly  summarize  contributions  in  this  thesis  and  present  the  overall 
conclusions  which  can  be  drawn  from  the  results  ot  our  research.  We  also  present  some 
suggestions  for  extending  this  work. 


Chaptw 


2 


Inverse  Synthetic 
Aperture  Radar  Imag¬ 
ing  System 


INVERSE  Synthetic  Aperture  Radar  (ISAR)  is  a  microwave  imaging  system  capable 
of  producing  high  resolution  imagery  from  data  collected  by  a  relatively  small  an- 
teima.  ISAR  can  be  explained  in  terms  of  spotlight  SAR  [6],  as  illustrated  in  Eigure  (2.1). 
Spotlight  SAR  is  obtained  as  the  radar  anteima  constantly  tracks  a  particular  target  area 
of  interest.  The  same  data  would  be  collected  if  the  radar  were  stationary  and  the  target 
area  were  rotating.  The  target  rotation  relative  to  the  radar  is  used  to  generate  the  target 
image.  This  is  precisely  the  idea  of  ISAR. 

Eigure  (2.2)  illustrates  the  data  collection  from  an  air  target  by  a  stationary  radar  as 
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V  ' 


Stationary  Radar 

Rotating  Target 

Moving  Radar 

Stationary  Target 

(a) 

(b) 

Figure  2.2:  SAR/ISAR  Equivalence:  (a)  ISAR;  (b)  SAR  equivalence 


the  target  rotates  through  an  angle  \!/.  The  spotlight  SAR  equivalent  geometry  is  the 
moving  radar  of  Figure  (2.2b),  collecting  the  same  data  while  flying  a  circular  segment 
around  an  identical  but  non-rotating  target.  The  SAR  aperture  length  L  in  Figure  (2.2b) 
corresponds  to  integration  angle  in  figure  (2.2a). 


2.1  Range  Processing 

ISAR  imagery  represents  reflectivity  magnitude  associated  with  the  illuminated  target. 
In  the  terminology  of  radar  signal  processing,  the  direction  of  radar  Line  of  Sight  (LOS) 
is  referred  to  as  range  and  the  direction  orthogonal  to  range  is  referred  to  as  cross-range 
or  azimuth. 

Range  is  determined  by  measuring  the  time  it  takes  for  a  transmitted  signal  to  travel 
a  round-trip  distance  between  radar  and  target.  The  ability  of  the  radar  to  determine 
the  range  of  a  particular  scatferer  in  a  vicinity  of  other  scatterers  depends  on  the  range 
resolution. 

Target  reflectivity  density  is  a  function  of  frequency  and  viewing  angle,  and  the  as¬ 
sumption  is  that  it  does  not  vary  significantly  over  the  bandwidth  of  transmitted  sig¬ 
nal,  which  is  considered  narrow  compared  to  the  carrier  frequency,  or  over  the  narrow 
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range  of  radar  viewing  angles. 

Consider  the  simplified  geometry  shown  in  Figure  (2.3).  An  anterma  located  at  the  ori¬ 
gin  illuminates  a  line  of  scatterers  centered  at  rc  =  i?,  having  length  W  and  reflectivity 
g  :  M  ^  C.  Let  the  transmitted  signal  §(t)  be  a  pulse  of  duration  Tp  and  bandwidth  /3 
given  by 

3(i)  =  (2.1) 

J-p 

where  s{t)  is  the  baseband  equivalent  signal  of  s{t),  and 


w(t) 


h  \t\<l 

0,  otherwise. 


(2.2) 


Then,  ignoring  the  round-trip  attenuation,  the  returned  signal  can  be  represented  as  the 
convolution  of  target  reflectivity  density  with  the  transmitted  signal 


r{x)  =  q{x)  -k  s{2x/c) 


(2.3) 


where  c  is  the  speed  of  light,  and  2x  jc  is  the  round  trip  delay. 

An  estimate  of  the  reflectivity  density  can  be  obtained  by  passing  r(a;)  through  a  matched 
filter  whose  impulse  response  is  hr{x)  =  s*{—2xlc).  Therefore,  the  estimate  of  q{x)  is 

q{x)  =  r{x) -k  s*{—2x/c) 

=  q{x) -k[s{2x/c) -k  s*{—2x/c)].  (2.4) 


Let's  define 


p{t)  =  s{t)-ks*{—t)  (2.5) 

then  we  can  rewrite  (2.4)  as 

q{x)  =  q{x)  -k  p{2x/c)  (2.6) 

Since  the  transmitted  pulse  usually  has  a  large  time-bandwidth  product,  p{t)  can  be 
approximated  by  a  stnc(-)  function.  The  estimate  of  target  reflectivity  density  may  thus 
be  represented  as 


q{x)  =  q{x)  ★  sinc(27r/3x/c). 


(2.7) 
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Figure  2.3:  Simplified  geometry  for  range  processing. 


The  range  resolution  is  hence  given  by  [41] 


(2.8) 


2.2  Cross-Range  Processing 

To  gain  a  basic  understanding  of  the  cross-range  resolving  mechanism  in  ISAR,  let's 
consider  Figure  (2.4)  with  a  line  of  scatterers  having  the  reflectivity  q{y).  As  the  radar 
is  moving  from  2;  =  — L/2t0  2;  =  L/2,  the  two-way  phase  advance  at  cross-range  y  is 


Itt  Itt 

'iy(z)  ~  —dr  =  —z  ■  sm7 
A  A 


(2.9) 


For  the  case  oi  y  R  and  L  R,  siny  approaches  y/R  and  2;  approaches  R(f),  —  \l//2  < 
(f)  <  \!//2,  Equation  (2.9)  can  be  re-expressed  in  terms  of  angle  (j) 


■r  /  47r  ^ ,  y  Itt  ,  \!/  ,  \!/ 


(2.10) 


Then,  the  echo  response  from  the  line  scatterers  at  (j)  is  given  by 


/+00 

q{y)e^^'^ydy. 

■00 


(2.11) 


An  estimate  of  the  reflectivity  density  can  be  obtained  by  integrating  the  echo  response 
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Figure  2.4:  Simplified  geometry  for  cross-range  processing 


over  a  small  integration  angle  \1/, 

r+f 

q{y)  =  /  g{(l))e~^^^yd(l) 

=  [  9i(l))wi^)e-^^^yd(f) 

J  —oo  ^ 

27r 

=  '^q{y)-ksmc{—^y),  (2.12) 


where  w{-)  is  the  window  function  defined  in  equation  (2.2).  Therefore  the  cross-range 
resolution  is  given  by  [6] 


Src  = 


(2.13) 


2.3  Maximum  Integration  Angle 

The  maximum  integration  angle  for  ISAR  imaging  system  before  uncorrected  rotation 
produces  defocusing  in  cross-range  can  be  defined  in  terms  of  allowable  two-way  phase 
deviation.  We  assume  that  the  radar  is  in  the  far  field  of  the  target  so  that  no  significant 
range  deviation  exists  over  the  target's  cross-range  extent.  What  remains  is  the  range 
deviation  produced  when  a  scatterer  first  approaches  and  then  recedes  from  the  radar 
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during  rotation.  As  in  Figure  (2.5),  for  a  small  H/, 

=  (r  +  5rY 

Solving  for  with  r  >>  8r,  we  have 


^  =  ( — y'^ 

r 


(2.14) 


(2.15) 


Assume  a  maximum  allowable  two-way  phase  deviation  of  tt/S  as  a  criterion  for  focus, 
the  corresponding  allowable  range  deviation  5r  is  A/32.  The  maximum  integration 
angle  [6]  then  becomes 

4'max=-Y^-  (2.16) 

The  cross-range  resolution  in  unfocused  ISAR  is  limited  by  the  maximum  integration 
angle  H/max/  and  a  Focused  Aperture  ISAR  is  desirable  for  a  larger  integration  angle.  Let's 
consider  the  echo  signal  from  scatterer  1,  located  at  (r,  9  =  0)  when  t  =  0  as  shown  in 
Figure  (2.5),  the  phase  advance  term  at  time  t  is 

^i(f)  =  = - 1 ^ 


A  2r 


(2.17) 
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where  the  target  rotates  at  a  constant  angular  rotation  rate  u.  The  two  way  phase  ad¬ 
vance  for  scatterer  2,  which  is  located  at  (r,  —9)  when  t  =  0,  is 


^2(t)  = 


Itt  {ut  — 

T  2 


(2.18) 


A  focused  aperture  for  ISAR  is  achieved  by  subtracting  from  \!/2(f)/  and  we  have 

,2 


-  ’l'i(f)  =  -^[-Uty  +  1^] 


(2.19) 


where  y  =  r9.  Define  the  target  rotation  angle  </>  =  ut,  then  we  can  rewrite  Equation 
(2.19)  in  terms  of  (j), 

=  -y  (-#  +  1^1  (2.20) 

Equating  (2.20)  with  (2.11),  we  obtain  the  range  resolution  [6] 


5rc 


1  A 

2lJf' 


(2.21) 


where  T  is  the  integration  time.  Equation  (2.21)  for  focused  ISAR  with  uT  =  is 
the  same  as  the  cross-range  resolution  (2.13)  for  an  unfocused,  small  integration  angle 
ISAR. 

The  ISAR  imaging  is  induced  by  target  motion.  During  the  imaging  time,  the  scatterers 
must  remain  in  their  range  cells.  Reflectivity  density  function  won't  remain  the  same 
over  a  wide  range  of  radar  viewing  angles.  Therefore  we  can  not  use  an  arbitrary  large 
integration  angle  in  Equation  (2.21).  Optimal  Integration  Angle  [35]  need  to  be  estimated 
to  achieve  the  highest  possible  cross-range  resolution  and  prevent  defocusing  in  the 
image. 


2.4  Range  Doppler  Imaging 

The  architecture  of  the  ISAR  image  formation  is  illustrated  in  Figure  (2.6),  with  a  stepped- 
frequency  (SF)  waveform.  Other  wide-hand  radar  waveforms,  such  as  a  linear  FM 
chirp  pulse  can  also  be  used  but  with  different  range  compression  techniques.  Using 


CHAPTER  2.  INVERSE  SAR  IMAGING  SYSTEM 


18 


Figure  2.6:  ISAR  imaging  system  architecture 


SF  waveforms,  a  radar  typically  transmits  a  sequence  of  N  bursts.  Each  burst  consists 
of  M  narrow-band  pulses.  Within  each  burst,  the  center  frequency  of  each  successive 
pulse  is  increased  by  a  constant  frequency  step  A/.  The  total  bandwidth  of  the  burst, 
i.e.,  P  =  MAf,  determines  the  radar  range  resolution.  The  total  number  of  bursts  for  a 
given  imaging  time  determines  the  Doppler  or  cross-range  resolution. 

To  form  a  radar  image,  N  bursts  of  returned  signals  first  pass  through  a  quadrature 
demodulator  to  obtain  in-phase  and  quadrature  signals  at  baseband.  An  M  x  N  array 
of  complex  data,  {Gm,n}o<m<M-i,  o<n<v-i/  is  constructed  to  represent  an  unprocessed 
spatial  frequency  signature  of  the  target.  The  radar  processor  uses  the  frequency  signa¬ 
tures  as  the  raw  data  to  perform  range  compression  and  standard  motion  compensa- 
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Figure  2.7:  ISAR  Image  of  A  Small  Boat 

tion.  Range  compression  functions  as  a  matched  filter  to  resolve  range.  For  SF  signals, 
the  range  compression  performs  an  M-point  IFF  to  each  of  the  N  received  frequency 
signatures 

r  =  T-^{G)  (2.22) 

where  denotes  the  IFT  operation  with  respect  to  variable  m.  All  together,  N  range 
profiles,  each  containing  M  range  cells,  are  thus  obtained. 

Along  each  range  cell,  N  range  profiles  constitute  a  time  history  series  of  the  target. 
The  Fourier  imaging  approach  applies  a  FFT  to  the  time  history  series  and  generates  a 
A-point  Doppler  spectrum,  or  Doppler  profile.  By  combining  the  N  Doppler  spectra  at 
M  range  cells,  a  M  x  N  image  is  formed 

I  =  (2.23) 

where  denotes  a  FFT  operation  with  respect  to  variable  n.  The  radar  image  /  is 
hence  the  target's  reflectivity  function  mapped  onto  the  range-Doppler  plane.  Figure 
(2.7)  shows  such  an  ISAR  image. 
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Chapter 


Inverse  SAR  Image 
Registration:  An  In¬ 
formation  Theoretic 
Approach 


Entropy  based  divergence  measures  have  shown  promising  results  in  many  ar¬ 
eas  of  engineering  and  image  processing.  In  this  paper,  a  new  generalized  di¬ 
vergence  measure,  Jensen-Renyi  divergence,  is  proposed.  Some  properties  such  as  con¬ 
vexity  and  its  upper  bound  are  derived.  Based  on  the  Jensen-Renyi  divergence,  we 
propose  a  new  approach  to  the  problem  of  ISAR  (Inverse  Synthetic  Aperture  Radar) 
image  registration.  The  goal  of  ISAR  image  registration  is  to  estimate  the  target  motion 
during  the  imaging  time.  Our  approach  applies  Jensen-Renyi  divergence  to  measure 
the  statistical  dependence  between  consecutive  ISAR  image  frames,  which  would  be 
maximal  if  the  images  are  geometrically  aligned.  Simulation  results  demonstrate  that 
the  proposed  method  is  efficient  and  effective. 


3.1  Introduction 


Image  registration  is  an  important  problem  in  computer  vision  [42]  [43],  remote  sens¬ 
ing  data  processing  [44]  [45]  and  medical  image  analysis  [46]  [47].  The  goal  of  image 
registration  is  to  find  a  spatial  transformation  such  that  a  similarity  metric  achieves  its 
maximum  between  two  or  more  images  taken  at  different  times,  from  different  sensors, 
or  from  different  viewpoints. 
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One  such  example  which  is  the  primary  interest  in  the  sequel  is  Inverse  Synthetic  Aper¬ 
ture  Radar  (ISAR)  imaging.  ISAR  is  a  microwave  imaging  system  capable  of  producing 
high  resolution  imagery  from  data  collected  by  a  relatively  small  anteima.  ISAR  imag¬ 
ing  is  induced  by  target  motion,  which  unfortunately  also  blurs  the  resulting  image. 
After  a  conventional  ISAR  translational  focusing  process,  image  registration  can  be  ap¬ 
plied  to  estimate  the  target  rotational  motion  parameter,  upon  which  polar  re-formating 
may  be  used  to  achieve  a  higher  resolution  image.  Related  work  in  this  area  includes 
the  image  registration  in  interferometric  SAR  processing  by  Gabriel  [48],  Li  [49]  and 
Lin  [50]  and  Fomaro  [51]. 

Over  the  last  three  decades,  a  wide  variety  of  registration  techniques  have  been  devel¬ 
oped  for  different  applications.  These  techniques  can  be  classified  [52]  into  correlation 
methods,  Fourier  methods,  landmark  mapping,  and  elastic  model-based  matching. 
Given  two  images  fi,  f 2  ■  I  x  I  e  ^  E,  correlation  methods  [53]  calculate  the 
normalized  two-dimensional  cross-correlation  function  C{fi,  between  /i  and 

/2,  where  T  is  a  Euclidean  transformation  with  translational  parameter  I  =  (4,  ly),  rota¬ 
tional  parameter  0  and  scaling  parameter  7.  The  registration  problem  may  be  succinctly 
stated  as 


(3.1) 


The  correlation  methods  are  generally  limited  to  registration  problems  in  which  the 
image  is  misaligned  by  only  a  small  rigid  transformation.  In  addition,  the  peak  of  the 
correlation  may  not  be  clearly  discernible  in  the  presence  of  noise.  Fourier  methods  [54] 
are  the  frequency  domain  equivalent  of  the  correlation  methods.  Fourier  methods  make 
use  of  the  translation  property  of  the  Fourier  transform  and  search  for  the  optimal  spec¬ 
trum  matching  between  two  images.  Since  rotation  is  invariant  under  a  Fourier  trans¬ 
formation,  rotating  an  image  merely  rotates  the  Fourier  transform  of  that  image  [55]. 
If  we  denote  Fi ,  F2  as  the  two-dimensional  Fourier  transform  of  /i ,  /2  respectively,  we 
obtain  the  phase  of  the  cross-power  spectrum  rotated  by  9  as 


)  ^?/)  ^2/) 


(3.2) 


|-Ll  (tJj;,  tJj^)  FgF2{oJx^^y')\ 
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To  determine  the  rotational  parameter  6,  one  proceeds  to  maximize  the  two-dimensional 
inverse  Fourier  transformation  of  PeioJx,  <^y)r  i-e.,  a  cross-correlation  which  is  as  peaked 
or  as  impulsive  as  possible,  and  the  location  of  that  impulse  is  exactly  the  translational 
parameter  /.  In  light  of  their  equivalence  to  the  correlation  methods,  Fourier  methods 
are  also  limited  to  registration  problems  with  a  small  rigid  transformation.  If  there 
exists  spatially  local  variation,  then  both  the  correlation  methods  and  the  Fourier  meth¬ 
ods  would  fail.  For  cases  of  unknown  misalignment  type,  landmark  mapping  tech¬ 
niques  [56]  and  elastic  model-based  matching  [57]  [58]  may  be  used  to  tackle  the  regis¬ 
tration  problem.  Landmark  mapping  techniques  extract  feature  points  from  a  reference 
image  and  a  target  image  respectively,  then  apply  piecewise  interpolation  to  compute 
a  transformation  to  map  the  feature  point  sets  from  the  reference  image  to  the  target 
image.  Landmark  based  methods  are  usually  labor  intensive  and  their  accuracy  de¬ 
pends  on  the  degree  of  reliability  of  the  feature  points.  Instead  of  finding  the  mapping 
between  the  feature  point  sets,  elastic  model-based  matching  methods  model  the  dis¬ 
tortion  in  the  image  as  the  deformation  of  an  elastic  material.  The  resulting  registration 
transformation  is  the  deformation  with  a  minimal  bending  and  stretching  energy.  Prac¬ 
tical  elastic  model-based  methods  [59]  are  based  on  computation  expensive  iterative 
algorithms,  and  the  choice  of  feature  points  plays  a  crucial  role  in  its  performance. 

In  the  work  of  Woods  [1]  and  Viola  [2],  mutual  information,  a  basic  concept  from  in¬ 
formation  theory,  is  introduced  as  a  measure  of  evaluating  the  similarity  between  im¬ 
ages.  When  the  two  images  are  properly  matched,  corresponding  areas  overlap,  and 
the  resulting  joint  histogram  contains  high  values  for  the  pixel  combinations  of  the 
corresponding  regions.  When  the  images  are  mis-registered,  non-matched  areas  also 
overlap  and  will  contribute  to  additional  pixel  combinations  in  the  joint  histogram.  In 
case  of  misregistration,  the  joint  histogram  has  less  significant  peaks  and  is  more  dis¬ 
persed  than  that  of  the  correct  alignment  of  images.  The  registration  criterion  is  hence 
to  find  a  transformation  such  that  the  mutual  information  of  the  corresponding  pixel 
pair  intensity  values  in  the  matching  images  is  maximized.  This  approach  is  accepted 
widely  [3]  as  one  of  the  most  accurate  and  robust  registration  measures.  Following  the 
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same  argument.  Hero  [4]  et  al.  extends  the  concept  to  apply  Renyi  entropy  to  measure 
the  joint  histogram  as  a  dissimilarity  metric  between  images. 

Inspired  by  this  previous  work,  and  looking  to  address  their  limitation  in  often  dif¬ 
ficult  imagery,  we  introduce  in  this  chapter  a  novel  generalized  information  theoretic 
measure,  namely  the  Jensen-Renyi  divergence  and  defined  in  terms  of  Renyi  entropy 
[5].  Jensen-Renyi  divergence  is  defined  as  a  similarity  measurement  among  any  finite 
number  of  weighted  probability  distributions.  Sharmon  mutual  information  is  a  special 
case  of  the  Jensen-Renyi  divergence.  This  generalization  provides  us  an  ability  to  con¬ 
trol  the  measurement  sensitivity  of  the  joint  histogram,  to  ultimately  result  in  a  better 
registration  accuracy. 

In  the  next  section,  we  give  a  brief  statement  of  the  problem.  In  section  3.3,  we  in¬ 
troduce  the  Jensen-Renyi  divergence  and  its  properties.  In  Section  3.4,  we  derive  the 
performance  upper  bounds  of  the  Jensen-Renyi  divergence  in  terms  of  the  Bayes  error 
and  also  of  the  asymptotic  error  of  the  Nearest  Neighbor  Classifier.  Section  3.5  describes 
the  concepts  of  image  registration  with  the  Jensen-Renyi  divergence.  Numerical  exper¬ 
iments  for  ISAR  image  registration  is  demonstrated  in  Section  3.6.  Finally,  we  provide 
concluding  remarks  in  Section  3.7. 


3.2  Problem  Statement 

ISAR  imagery  is  induced  by  target  motion,  and  the  target  motion  in  turn  causes  time- 
varying  spectra  of  the  received  signals.  Motion  compensation  has  to  be  applied  to 
obtain  a  high  resolution  image.  The  objective  of  ISAR  image  registration  is  to  estimate 
the  target  motion  during  the  imaging  time.  Let  T{i^e,-y)  be  a  Euclidean  transformation 
with  translational  parameter  I  =  (4,  ly),  rotational  parameter  9  and  scaling  parameter 
7.  Given  two  ISAR  image  frames  fi  and  /2,  the  estimates  of  target  motion  parameters 
are  given  by 

(/*,  r ,  7*)  =  arg  max  (/i, 


(3.3) 
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where  (•)  is  an  induced  similarity  measure  based  on  Jensen-Renyi  divergence  of 
order  a  and  weight  u,  which  is  maximal  when  /i  matches  As  the  radar  tracks 

the  target,  the  reflected  signal  is  continuously  recorded  during  the  imaging  time.  By 
registering  a  sequence  of  consecutive  image  frames,  {fi}f=Q,  the  target  motion  during 
the  imaging  time  can  be  estimated  by  interpolating  {(/j,  0^,7^)},^!.  Based  on  the  esti¬ 
mated  trajectory  of  the  target,  translational  motion  compensation  (TMC),  and  rotational 
motion  compensation  (RMC)  [6]  can  be  used  to  generate  a  focused  image  of  the  target. 

3.3  The  Jensen-Renyi  Divergence 

Let  k  E  N  and  X  =  {xi,X2, . . . ,  be  a  finite  set  with  a  probability  distribution  p  = 
(pi, P2,  ■  ■  ■ , Pk)f  i-e.  YL]=i  Pj  =  1  pj  =  P{xj)  >  0,  where  P(-)  denotes  the  probability. 
Renyi  entropy  is  a  generalization  of  Shaimon  enfropy  [60],  and  is  defined  as 


n  >  0  and  n  ^  1. 


(3.4) 


For  n  >  1,  the  Renyi  entropy  is  neither  concave  nor  convex. 

For  a  E  (0, 1),  it  is  easy  to  see  that  Renyi  entropy  is  concave,  and  tends  to  Shaimon 
entropy  iF(p)  as  a  ^  1.  It  can  easily  be  verified  thaf  is  a  non-increasing  function  of 
a,  and  hence 


Ra{p)>H{p),  Vae(0,l). 


(3.5) 


In  the  sequel,  we  will  restrict  a  E  (0, 1),  unless  otherwise  specified,  and  will  use  a  base 
2  logarithm,  i.e.,  the  measurement  unit  is  bits. 

As  shown  in  Figure  3.1,  uncertainty  is  at  a  minimum  when  Sharmon  entropy  is  used, 
and  it  increases  as  a  decreases.  Renyi  entropy  attains  a  maximum  uncertainty  when  a 
is  equal  to  zero. 

Definition  3.1  Let  pi,p2, . . .  ,Pnbe  n  probability  distributions  on  X  and  on  =  {ui,U2, ,  cOn) 
be  a  weight  vector  such  that  >  0-  Jensen-Renyi  divergence  is  defined 


CHAPTER  3.  INVERSE  SAR  IMAGE  REGISTRATION 


26 


a=0 

— 

a=0.2 

— 

a=0.5 

a=0.7 

Shannon 

o' - ^ ^ ^ ^ ^ ^ ^ ^ ^ - ' 

0  0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8  0.9  1 

Probability 


Figure  3.1:  Shannon  and  Renyi  entropy  of  Bernoulli  distribution  p  =  {p,l  —  p)  for 
different  values  of  a 


as 


(^iP^ 


i=l 


where  Ra{p)  is  the  Renyi  entropy,  n  >  0  and  n  ^  1. 


n 

'^UiRaiPi), 

i=l 


Using  the  Jensen  inequality,  it  is  easy  to  check  that  the  Jensen-Renyi  divergence  is  non¬ 
negative  for  n  e  (0, 1).  It  is  also  symmetric  and  vanishes  if  and  only  if  the  probability 
distributions  Pi,P2,  ■  ■  ■  ,Pn  are  equal,  for  all  a  >  0.  Figure  3.2  illustrates  the  three  di¬ 
mensional  representation  of  the  Jensen-Renyi  divergence  for  two  Bernoulli  probability 
distributions,  with  a  =  0.7. 

When  a  ^  1,  the  Jensen-Renyi  divergence  is  exactly  the  generalized  Jensen-Shaimon 
divergence  [61]. 

Unlike  other  entropy-based  divergence  measures  such  as  the  well-known  Kullback  di¬ 
vergence,  the  Jensen-Renyi  divergence  has  the  advantage  of  being  symmetric  and  gen- 
eralizable  to  any  finite  number  of  probability  distributions,  with  a  possibility  of  assign¬ 
ing  weights  to  these  distributions. 
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q  p 

Figure  3.2:  3D  representation  of  Jensen-Renyi  divergence  JI^  {p,  q),  p  =  (p,  1  —  p), 
q=  (q^l-  q)^a  =  0.7,  ca  =  (0.5,  0.5). 

The  following  result  establishes  the  convexity  of  the  Jensen-Renyi  divergence  of  a  set 
of  probability  distributions. 

Proposition  3.1  For  a  e  (0,1),  the  Jensen-Renyi  divergence  JR^  is  a  convex  function  of 
PuP2,  ,Pn- 
Proof:  see  Appendix. 

The  following  result,  in  a  sense,  clarifies  and  justifies  calling  upon  the  Jensen-Renyi 
divergence  as  a  measure  of  disparity  among  probability  distributions. 

Proposition  3.2  The  Jensen-Renyi  divergence  achieves  its  maximum  value  when  Pi,P2,  ■■■  ,Pn 
are  degenerate  distributions. 

Proof:  The  domain  of  is  a  convex  polytope  in  which  the  vertices  are  degenerate 
probability  distributions.  That  is,  the  maximum  value  of  the  Jensen-Renyi  divergence 
occurs  at  one  of  the  degenerate  distributions.  ■ 


Since  the  Jensen-Renyi  divergence  is  a  convex  function  oi  Pi,p2,  ■  ■  ■  ,p„,  it  achieves  its 
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maximum  value  when  the  Renyi  entropy  function  of  the  ca-weighted  average  of  degen¬ 
erate  probability  distributions,  achieves  its  maximum  value  as  well. 

Assigning  weights  ui  to  the  degenerate  distributions  Ai,  A2, . . . ,  A„,  A*  =  {Sij},  j  = 
1,  2, ...  A;,  the  following  upper  bound 


JRa  <  Ra  ( )  , 


(3.6) 


i=l 


which  easily  falls  out  of  the  Jensen-Renyi  divergence,  may  be  used  as  a  starting  point. 
Without  loss  of  generality,  consider  the  Jensen-Renyi  divergence  with  equal  weights 
uji  =  1/n  for  all  i,  and  denote  it  simply  by  JRa,  to  write 


JRa  <  Ra  (  ^Ai/n) 


i=l 


k  n 


n 


where 


j=l  i=l 
CX 

=  Ra{a)^ - 7log(n), 

a  —  1 


a  =  (tti,  02, . . . ,  ttk)  such  that  aj  =  5^. 

2  =  1 


(3.7) 


(3.8) 


Since  Ai,  A2, . . . ,  A„  are  degenerate  distributions,  then  we  have  =  n,\/k  >  n. 

From  (3.7),  it  is  clear  that  JR^  achieves  its  maximum  value  when  Ra{a)  does  as  well. 

In  order  to  maximize  Ra{a),  the  concept  of  majorization  [62]  will  be  used.  Let  (x[i],  X[2], . . . ,  X[q) 
denote  a  non-increasing  ordering  of  the  components  of  a  vector  x  =  {xi,X2.,  ■  ■  ■  ^Xk)- 

Definition  3.2  Let  a  and  h  eBL  ,  a  is  said  to  be  majorized  by  b,  written  a  -<  b,  if 

Ek  1 

j=i  ®[i]  —  Aj=i  0[i] 

®[i]  A  ^[i])  i  =  1,2, . . .  ,k  —  1. 

Definition  3.3  A  real-valued  function  f  defined  on  a  set  Q  c  is  said  to  be  Schur-concave 
on  Q  if 


a  -<  b 


4>{a)  >  fib),  Va,  b  eLI. 
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Define  the  function  g{aj)  =  a'^,  a  E  (0, 1),  on  an  interval  J  c  M.  It  is  clear  that  g  is  a 
concave  function  on  J,  thus  (/i(a)  =  di^j)  Schur-concave  [62]  on  J^,  that  is 

k  k 

a-<b  '^g{aj)>^g{bj). 

i=i  i=i 

Since  log(-)  is  an  increasing  function,  and  a  E  (0, 1),  it  follows  that 

a  -<  b  =>  Raid)  >  Ra{b). 

Therefore,  Ra{  )  is  a  Schur-concave  function.  The  following  result  establishes  the  max¬ 
imum  value  of  the  Jensen-Renyi  divergence. 

Proposition  3.3  Let  p^,p2, . . .  ,Pnbe  n  probability  distributions  with 

P^  =  {Pil,Pi2,  ■  ■  ■,Pik),  =  1,  Pij  >  0- 

Ifk  =  r  (mod  n),  0  <  r  <  n,  then 

JRa<^^iog{k-r),  (3.9) 

where  a  E  (0, 1). 

Proof:  It  is  clear  that  the  vector 

k—r  r 

' - " - ^  . - * - . 

g  =  {n/{k-r),...,n/{k-r),0,...,0) 

is  majorized  by  the  vector  a  defined  in  (3.8).  Hence,  Ra{a)  <  Ra{g)-  Invoking  Equation 
(3.7)  completes  the  proof.  ■ 

According  to  Proposition  3.3,  and  for  the  special  case  of  A;  =  0  (mod  n)  the  following 
inequality  holds 

JRa{Pl,P2,---,Pn)  <  l0g(A:). 

It  is  of  special  interest  for  the  Jensen-Renyi  divergence  between  the  histograms  and 
Pf,  with  weights  {/5,  1  —  /3},  f  E  [0, 1].  Let 


p={l-  /5/2)p„  +  (/5/2)pj 
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then  the  corresponding  Jensen-Renyi  divergence  can  then  be  expressed  as  a  function  of 

/5 

TO  TT  l^)Pa  + I^Pb)+ Ra{Pa) 

JRa{p)  =  Ra{P) - ^ - • 


JRJP) 


P 


Figure  3.3:  Jensen-Renyi  divergence  as  a  function  of  /3 

Proposition  3.4  IfPa  ^  Pb  then  the  Jensen-Renyi  divergence  JRa{Ji)  achieves  its  maximum 
value  when  j3  =  1. 

Proof:  see  Appendix. 

Figure  (3.3)  illustrates  the  Jensen-Renyi  divergence  as  a  function  of  f  whenp^  =  (0.3, 0.2, 0.5), 
Pi,  =  (0.2,  0.4,  0.4)  and  a  =  0.7. 


3.4  The  Jensen-Renyi  Divergence:  Performance  Bounds 

In  this  section,  performance  bounds  of  the  Jensen-Renyi  divergence  in  terms  of  the 
Bayes  error  and  also  of  the  asymptotic  error  of  the  NN  classifier  are  derived. 

Let  X  =  {xi,  X2,  ...Xh}  be  a  set  of  feature  vectors  and  C  =  {ci,  C2, . . . ,  c„}  be  a  set  of  n 
classes.  By  a  classifier  we  mean  a  function  f  :  X  ^  C  that  classifies  a  given  feature 
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vector  (pattern)  x  E  X  io  the  class  c  =  f{x).  Denote  X,C  be  two  random  variables 
taking  values  in  X  and  C  respectively.  It  is  well  known  that  the  classifier  that  minimizes 
the  error  probability  P{f{X)  ^  C)  is  the  result  of  the  Bayes  classifier  with  an  error  Lb 
written  in  discrete  form  as 

k 

Lb  =  ini  P{f{X)  ^  C}  =  1  -  max{uiPij}, 

i=l 

where  uji  =  P{C  =  q)  are  the  class  probabilities,  and  pij  =  P{X  =  Xj\C  =  q)  are 
the  class-conditional  probabilities.  Denote  by  ca  =  and  Pj  =  {pij)i<j<k,'^i  = 

l,...,n. 

Proposition  3.5  The  Jensen-Renyi  divergence  is  upper  bounded  by 

JRa  (PuP2,  ■■■,Pn)  <  -  2Lb,  (3.10) 

where  Ra{w)  =  Ra{C),  and  a  e  (0, 1). 

Proof:  According  to  proposition  1,  we  have 

Ro^iC)  -  JR^{p„P,,  ...,Pn)=  RaiC\X). 

It  has  been  proven  in  [63]  that  H{C\X)  >  2Lb,  then  (3.5)  implies  Ra{C\X)  >  2Lb-  This 
completes  the  proof.  ■ 

A  method  that  provides  an  estimate  for  the  Bayes  error  without  requiring  knowledge 
of  the  underlying  class  distributions  is  based  on  the  NN  classifier  which  assigns  a  test 
pattern  to  the  class  of  its  closest  pattern  according  to  some  metric  [64]. 

For  n  sufficiently  large,  the  following  result  relating  the  Bayes  error  Lb  and  the  asymp¬ 
totic  error  of  the  NN  classifier  holds  [64] 

-  - -Latat^  <  Lb  <  Lffff.  (3.11) 

Using  (3.10),  the  following  inequality  is  deduced 

JK  <  W  -  (i  -  . 
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3.5  Image  Registration  with  the  J ensen-Renyi  Divergence 


(1)  (2) 


Figure  3.4:  Conditional  probability  distributions 


Let  /i,  /2  be  two  digital  images  defined  on  a  bounded  domain  Q  C  FF,  the  goal  of  image 
registration  is  to  determine  the  spatial  transformation  parameters  (/*,  9*,Y)  such  that 

(r,r,7*)  =  aTgmaxDjj^{fi,T(i,e,^)f2)  (3.12) 

(1,9,1) 

=  argmax  (Pi(/i,  Tii,e,^)f2),P2ifi,  T{i,e,i)f2),  ■  ■  ■,Pn{h.  T{i,e,i)f2)) 
(1,9,1) 

where  (•)  is  the  Jensen-Renyi  divergence  of  order  a  and  weight  ca. 

Denote  A  =  {xi,X2, . . . ,  Xn}  and  y  =  {yi,  ?/2,  •  •  • ,  Vn}  the  sets  of  pixel  intensity  values  of 
/i  and  T{i^e,i)f2  respectively,  and  let  A,  Y  be  two  random  variables  taking  values  in  X 
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and  y.  Piifi,  T(i,e,ry)f2)  =  {Pij)i<j<n  is  defined  as 

Pij  =  P{Y  =  yj\X  =Xi),  j  =  l,2,...,n 

which  is  the  conditional  probability  ot  T(i,0,j)f2  given  /i  for  the  corresponding  pixel 
pairs.  Here  the  Jensen-Renyi  divergence  acts  as  a  similarity  measure  between  images. 
If  the  two  images  are  exactly  matched,  then  p-  =  i  =  1,  2, ...,  n.  Since  p[s  are 

degenerate  distributions,  by  Proposition  3.2,  the  Jensen-Renyi  divergence  is  maximized 
for  a  fixed  a  and  u.  Figure  (3.4.1-3.4.2)  show  two  brain  MRT  images  in  which  the 
misalignment  is  a  Euclidean  rotation.  The  conditional  probability  distributions  {p^} 
are  crisp,  as  in  Figure  (3.4.3),  when  the  two  images  are  aligned,  and  dispersed,  as  in 
Figure  (3.4.4),  when  they  are  not  matched. 

It  is  worth  noting  that  the  maximization  ot  the  Jensen-Renyi  divergence  holds  for  any  a 
and  oj  such  that  0  <  n  <  1  and  uji  >  0,  uji  =  1.  If  we  take  a  =  1  and  oji  =  P{X  =  xi) 
then,  by  Proposition  3.1,  the  Jensen-Renyi  divergence  is  exactly  the  Sharmon  mutual 
information.  Indeed,  the  Jensen-Renyi  divergence  induced  similarity  measure  provides 
a  more  general  framework  for  the  image  registration  problem. 


Figure  3.5:  Mutual  information  vs.  Jensen-Renyi  divergence  ot  uniform  weights 

If  the  two  images  /i  and  T{i,e,-i)f2  are  matched,  the  Jensen-Renyi  divergence  is  maxi¬ 
mized  for  any  valid  weight.  Assigning  uji  =  P{X  =  x,)  is  not  always  a  good  choice. 
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Figure  (3.5)  shows  the  registration  results  of  the  two  brain  images  in  Figure  (3.4)  us¬ 
ing  the  mutual  information  and  the  Jensen-Renyi  divergence  of  a  =  1  and  uniform 
weights.  The  peak  at  the  matching  point  generated  by  the  Jensen-Renyi  divergence  is 
clearly  much  higher  than  the  peak  by  the  mutual  information,  uji  =  P{X  =  Xi)  gives 
the  background  pixel  the  largest  weight.  In  the  presence  of  noise,  the  matching  in  back¬ 
ground  is  corrupted.  Mutual  information  may  fail  to  identify  the  registration  point. 
This  phenomenon  is  demonstrated  in  Figure  (3.6).  The  following  proposition  estab¬ 
lishes  the  optimality  of  the  uniform  weights  for  image  registration  in  the  context  of  the 
Jensen-Renyi  divergence. 

Proposition  3.6  Let  (3  be  a  uniform  weight  defined  as  /3i  =  1/n,  i  =  1,2, . . .  ,n  and  w  be  any 
vector  such  that  a;,  >  0,  ^^^=1  =  1-  If  the  misalignment  between  fi  and  /2  can  be  modeled 

by  a  spatial  transformation  T*,  then  the  following  inequality  holds 

jR^ipAfi, r72), . . .  ,Pn(/i, T72))  >  r72), . . .  ,p„(/i, r72)),  va  e  [0,  i]. 

(3.13) 

Proof:  p^  =  Ai,  i  =  1,2,. ..  ,n  when  /i  and  /2  are  aligned  by  the  spatial  transformation 
T*,  then  JR‘^ (•)  becomes 

n 

JR‘^{pf  h,Tf2),  .  .  .  ,Pn(/l,r72))  =  RaC^UJ^Af  = 

i=l 

Since  (3  -<  w  [62]  and  Ra{  )  is  Schur-concave,  we  obtain  Ra{P)  >  Ra{‘^)-  This  completes 
the  proof  of  the  proposition.  ■ 

After  assigning  uniform  weights  to  the  various  distributions  in  the  Jensen-Renyi  di¬ 
vergence,  a  free  parameter  a,  which  is  directly  related  to  the  measurement  sensitivity, 
remains  to  be  selected.  In  the  image  registration  problem,  one  desires  a  sharp  and 
distinguishable  peak  at  the  matching  point.  The  sharpness  of  the  Jensen-Renyi  diver¬ 
gence  can  be  characterized  by  the  maximal  value  as  well  as  the  width  of  the  peak.  The 
sharpest  peak  is  clearly  a  Dirac  function.  The  following  proposition  establishes  that  the 
maximal  value  of  the  Jensen-Renyi  divergence  is  independent  of  a  if  the  two  images 
are  aligned,  and  0  =  0  yields  the  sharpest  peak. 
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(3)  d{A,B) 


Mutual  Information 
JR  Divergence 


Figure  3.6:  Registration  result  in  the  presence  of  the  noise  with  a  =  1. 


Proposition  3.7  Let  /3  bea  uniform  weight  vector.  If  the  misalignment  between  fi  and  f2  can 
be  modeled  by  a  spatial  transformation  T*,  then 

r\f2), . . .  ,Pn{fu  r*/2))  =  logn,  Vn  e  [0, 1],  (3.14) 


In  case  of  a  =  0, 

JRa{PuP2,---,Pn)  =0 

for  any  probability  distribution  p^  such  that  pij  >  0,  i,j  =  l,2,...,n  and 

JRa{Pl^P2,---.Pn)  =logn 
if  and  only  ifp-  =  A^,  i  =  1,  2, . . . ,  n. 

Proof:  see  Appendix 

If  there  exists  local  variation  between  f  and  /2,  or,  if  the  registration  of  the  two  images  is 
in  the  presence  of  noise,  then  an  exact  alignment  T*  may  not  be  found.  The  conditional 
probability  distribution  pffi,  T*  f2)  is  no  longer  a  degenerate  distribution  in  this  case. 
The  following  proposition  establishes  that  taking  a  =  1  would  provide  a  higher  peak 
than  any  other  choice  of  a  for  the  non-ideal  alignment. 
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Proposition  3.8  Let  =  Aj  +  Sp^,  i  =  1,2, ...  ,n,  where  Sp^  =  {Spij)i<j<n  is  a  real  distor¬ 
tion  vector  such  that  pij  >  0,  ^Pij  =  0  Yl'i=i  ^Pij  =  0-  a:  he  a  weight  vector  and 

denote  JR‘*^{-)  as  the  Jensen-Renyi  divergence  with  a  =  1,  then  we  have 

JR^{p^,P2,...,pJ>JR‘^{Pi,P2,---,Pn),  Vae(0,l).  (3.15) 

Proof:  Observe  that  for  any  probability  distribution  p,  Ra{p)  >  H{p),  Vo  G  (0, 1),  then, 

n  n 

'^UiH{pf  <  ^UiRa{Pi),  Vo  e  (0, 1)  (3.16) 

i=l  i=l 

Since  YL^j=i  ^Pij  =  0)  ^Pij  =  0/  ^^^e  Renyi  entropy  of  o  =  1  is  exactly  the  Shan¬ 

non  entropy,  the  inequality  (3.15)  is  equivalent  to  the  inequality  (3.16).  This  completes 
the  proof  of  Proposihon  3.8.  ■ 


3.5.1  Discussion 


(1) 


e 


(2) 


Figure  3.7:  Effect  of  the  order  a  in  image  registration 

In  real  world  applications,  there  is  a  trade  oft  between  optimality  and  practicality  in 
choosing  a.  If  one  can  model  the  misalignment  between  /i  and  /2  completely  and  ac¬ 
curately,  q;  =  0  would  correspond  to  the  best  choice  since  it  generates  the  sharpest  peak 
at  the  matching  point.  It  is,  however,  also  the  least  robust  selection,  as  it  tends  to  make 
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all  the  p[s  the  same  as  the  uniform  distribution,  if  p-  is  not  degenerate  distribution  and 
Pij  >  0,  then  the  Jensen-Renyi  divergence  would  be  zero  for  the  whole  transformation 
parameter  space  as  in  case  where  the  adapted  transformation  group  can  not  "accu¬ 
rately"^  model  the  relationship  between  fi  and  /2  accurate  enough.  On  the  other  hand, 
a  =  1  is  the  most  robust  choice,  in  spite  of  also  resulting  in  the  least  sharp  peak.  The 
choice  of  a  therefore  depends  largely  on  the  accuracy  of  the  invoked  model  and  on  the 
specific  application  as  well  as  the  available  computational  resource.  As  an  example. 
Figure  (3.7.1)  demonstrates  the  registration  results  of  the  two  brain  images  in  Figure 
(3.4)  with  the  choice  of  different  a.  In  this  case,  a  =  0  is  the  best  choice  and  would  gen¬ 
erate  a  Dirac  function  with  a  peak  at  the  matching  point,  as  illustrated  in  Figure  (3.7.2). 


3.6  Numerical  Experiments:  ISAR  Image  Registration 

y 


V 

V 


Figure  3.8:  ISAR  geometry  of  a  moving  target 


Generating  an  ISAR  image  by  using  stepped  frequency  waveform  [6]  can  be  under¬ 
stood  as  a  process  of  estimating  the  target's  two-dimensional  reflectivity  density  func¬ 
tion  p{x,  y)  from  data  collected  in  the  frequency  space.  Suppose  a  stepped  frequency 
burst  consists  of  M  pulses  in  which  the  transmitted  frequency  linearly  increases  from 
ujq  to  ujq  +  (M  —  l)At(j,  where  ujq  is  the  base  frequency  in  rads/s  and  Auj  is  the  step  fre¬ 
quency.  Let  the  mth  transmitted  pulse  Sm  (t)  be  a  pulse  of  duration  Tp  and  expressed  in 


^within  some  acceptable  tolerance 
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complex  form  as 


f  —  7T)T 

J-  n 


where  Wm  =  Wq  +  mAcu,  and 


W{t) 


1,  0  <  i  <  1 

0,  otherwise. 


(3.17) 


(3.18) 


CO 

y 


Figure  3.9:  Polar  formatted  data  in  spatial  frequency  space 


Let's  define  uj{t)  =  Uq  +  {m  —  l)Auj,  mTp  <  t  <  {m  +  l)Tp,  m  =  0, 1,  ...M  —  1.  Under 
uniform  illumination,  the  reflected  signal  from  the  target  differential  area  dx  x  dy  at  the 
target  coordinate  {x,  y)  is 

h{x,  y,  t)  =  Ap{x,  0  <  t  <  (M  -  l)Tp  (3.19) 

where  A  is  a  constant  attenuation  factor,  which  we  can  set  to  1  without  a  loss  of  gen¬ 
erality.  The  distance  between  the  radar  anteima  and  the  target  reflection  point  lo¬ 
cated  at  {x,  y)  is  denoted  by  r{t).  We  obtain  the  expression  of  the  received  signal  for 
0  <  t  <  (M  —  l)rp  by  integrating  reflections  from  all  the  point  scatterers  in  the  target 

/+00  ^+oo 

/  h{x,y,t)dxdy 

■OO  j  —  00 

/+00  ^+oo 

■00  J  —  00 


(3.20) 
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After  quadrature  demodulation,  we  obtain 

/+00  ^+oo 

/  p{x,  0<t<{M-  l)Tp  (3.21) 

■OO  j  —  00 

It  can  be  observed  from  Figure  (3.8)  that,  for  target  dimension  that  are  relatively  smaller 
than  the  target  range  R,  the  distance  r(-)  from  the  radar  anteima  to  target  reflection 
point  located  at  {x,  y)  is 

r{t)  ~  R{t)  +  X  cos  9{t)  —  y  sin  9{t).  (3.22) 


Inserting  Equation  (3.22)  into  Equation  (3.21),  we  deduce  the  baseband  signal  in  terms 
of  target  coordinate  (x,  y)  and  rotation  angle  9 

/H-oo  ^+oo 

/  p{x,  0<t<{M-  l)Tp  (3.23) 

00  j  — OO 


where 

^x{t)  =  — —cos9{t)  (3.24) 

c 

and 

Uy{t)  =  — —sm9{t)  (3.25) 

are  spatial  frequency  quantities  defined  at  frequency  uj{t)  and  target  rotation  angle  9{t). 
The  phase  term  e-i2w(t)ii(t)/c  jg  related  to  the  target  translational  motion  only,  and  can 
by  compensated  by  traditional  translational  motion  compensation  methods. 

By  sampling  at  tm  =  {m  +  ^)Tp,  m  =  0, 1,  ...(M  —  1),  we  obtain  the  data 

collected  in  the  frequency  space  G{m)  as 


G{m) 


m  =  0, 1,  ...(M  —  1) 


(3.26) 


To  form  a  radar  image,  N  bursts  of  received  signal  are  sampled  and  organized  burst  by 
burst  into  a  M  x  N  two-dimensional  array,  which  is  shown  in  Figure  (3.9).  This  sample 
matrix  is  not  uniformly  spaced  in  the  spatial  frequency,  instead,  it  is  polar  formatted 
data.  The  Discrete  Fourier  Transform  processing  of  the  polar  formatted  data  would  re¬ 
sult  in  blurring  at  the  edges  of  the  target  reflectivity  image.  Figure  (3.10)  is  a  synthetic 
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Figure  3.10:  ISAR  image  of  moving  target  reconstructed  by  the  Discrete  Fourier  Trans¬ 
formation 


ISAR  image  of  an  aircraft  MIG-25  [65].  The  radar  is  assumed  to  be  operating  at  9GHz 
and  transmits  a  stepped-frequency  waveform.  Each  burst  consists  of  64  narrow-band 
pulses  stepped  in  frequency  from  pulse  to  pulse  by  a  fixed  frequency  step  of  8MHz. 
The  pulse  repetition  frequency  is  IbKHz.  Basic  motion  compensation  processing  has 
been  applied  to  the  data.  A  total  of  512  bursts  of  received  signal  are  taken  to  reconstruct 
the  image  of  this  aircraft,  which  corresponds  to  2.18s  integration  time.  As  we  can  see, 
the  resulting  image  is  defocused  due  to  the  target  rotation.  In  fact,  the  defocused  im¬ 
age  in  Figure  (3.10)  is  formed  by  overlapping  a  series  of  MIG-25s  at  different  viewing 
angles.  By  replacing  the  Fourier  transform  with  the  time  varying  spectral  analysis  tech¬ 
niques  [35]  [65],  we  can  take  a  sequence  of  snapshots  of  the  target  during  the  2.18s  of 
integration  time.  Figure  (3.11.1-3.11.6)  shows  the  trajectory  of  the  MIG-25,  with  6  image 
frames  taken  at  t  =  0.1280s,  0.4693s,  0.8107s,  1.1520s  ,  1.4933s,  1.8347s  respectively. 

Image  registration  can  be  applied  to  estimate  the  target  motion  from  this  sequence  of 
images.  For  the  synthetic  ISAR  images  shown  in  Figure(3.11),  we  search  for  the  rotation 
angles  between  a  sequence  of  image  frames  observed  in  a  time  interval 
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Figure  3.11:  Trajectory  of  a  sequence  of  MIG-25  image  frames 
[0,  r].  By  Equation  (3.12),  9^  is  given  by 
9*  =  argmax 

Figure  (3.12.1-3.12.5)  shows  the  rotation  angles  {9i  obtained  by  registering  the  6  con¬ 
secutive  MIG-25  image  frames.  As  can  already  be  seen  in  the  figures,  uniform  weights 
produce  the  sharpest  peak. 

By  interpolating  {9i}f^i,  we  obtain  an  estimated  trajectory  of  the  MIG-25  rotational  mo¬ 
tion  during  the  imaging  time,  as  shown  by  the  solid  line  in  Figure  (3.12.6).  The  dotted 
line  in  Figure  (3.12.6)  is  the  true  trajectory.  The  standard  deviation  is  0.5580°.  An  esti¬ 
mated  trajectory  of  a  target  is  particularly  important  since  it  may  be  subsequently  used 
in  polar  re-formating  [6]  and  re-sampling  the  received  signal  into  rectangular  format. 
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e 


Figure  3.12:  Image  registration  of  a  MIG-25  Trajectory 

This  results  in  a  focused  image  of  the  MIG-25  based  on  all  the  received  signals  in  the 
time  interval  [0,  2.18s],  as  demonstrated  in  Figure  (3.13). 


3.7  Conclusions 

A  new  generalized  divergence  measure,  Jensen-Renyi  divergence,  is  proposed  in  this 
paper.  We  prove  the  convexity  of  this  divergence  measure,  derive  its  maximum  value, 
and  analyze  its  performance  upper  bounds  in  terms  of  the  Bayes  error  of  nearest  neigh¬ 
bor  classifier.  Based  on  the  Jensen-Renyi  divergence,  we  propose  a  new  approach  to  the 
problem  of  image  registration.  Compared  to  the  mutual  information  based  registration 
techniques,  the  Jensen-Renyi  divergence  adjusts  its  weight  and  exponential  order  to 


3.8.  APPENDIX 


43 


control  the  measurement  sensitivity  of  the  joint  histogram.  This  flexibility  ultimately 
results  in  a  better  registration  accuracy 


Figure  3.13:  Reconstructed  MIG-25  by  polar  reformatting 


3.8  Appendix 

Proof  of  Proposition  3.1 

Denote  A  =  {xi,  X2,.  ■  ■ ,  Xn}  and  y  =  {?/i,  ?/2,  •  •  • ,  Un}-  Let  X,  Y  be  two  random  variables 
taking  values  in  X  and  y.  Recall  that  the  mutual  information  between  X  and  Y  is  given 
by  [66] 

I{X;  Y)  =  H{Y)  -  H{Y\X),  (3.27) 

where  H{Y)  is  the  Shannon  entropy  of  Y  and  H{Y\X)  is  the  conditional  Shannon  en¬ 
tropy  of  Y  given  X. 

Instead  of  using  Shannon  entropy  in  (3.27),  the  mutual  information  can  be  generalized 
using  Renyi  entropy.  Therefore,  the  n-mutual  information  can  be  defined  as 

4(X;  Y)  =  R^(Y)  -  R^{Y\X),  a  G  (0, 1), 


where  Ra  is  the  Renyi  entropy  of  order  n  e  (0,  !)• 
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Denote  by  P{xi)  =  uji,  P{Y  =  yj\X  =  Xi)  =  Pij  and  P{Y  =  i/j)  =  cjj,  then  it  is  easy  to 
check  that 

-  Ra{Y\X)  =  JE^ip„p„  . . .  ,pj,  (3.28) 

where  p^  =  {pij)i<j<k,  for  alH  =  1, . . . ,  n. 

For  fixed  uJi,  the  mutual  information  is  a  convex  function  of  pij  [66],  then  it  can  be 
verified  that  the  n-mutual  information  is  also  a  convex  function  of  pij,  leading  to  the 
Jensen-Renyi  divergence  a  convex  function  oi p^,p2, . . .  ^p^.  ■ 


Proof  of  Proposition  3.4 

Denote  by  p^  =  {Pak)kZo^  Pb  =  iPbk)kZo  and  dk  =  Pbk  -  Pak,  where  L  -  1  is  the  maximum 
gray  level  of  the  image.  The  Jensen-Renyi  divergence  can  then  be  written  as 


JRa{^) 


k=Q  ^  ' 

~  2(1 -a)  + 

^  ^  k=0 

1 


Simple  calculations  show  that  the  first  derivative  JR^{f3)  vanishes  for  /3  =  0,  and  the 
second  derivative  JR'^{j3)  is  positive,  that  is  JR'^{l3)  is  an  increasing  function  of  It 
follows  that  JR'^{(3)  >  0  for  all  /3  G  [0, 1].  Therefore,  JR^if)  is  an  increasing  function  of 
I3.  This  concludes  the  proof.  ■ 


jR'!(p,i.h,rh),...,pMuTh)) 

n 

=  R^{Y,I^Ai) 

i=l 


=  logn 


Proof  of  Proposition  3.7 
By  Proposition  3.6, 
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For  a  =  0, 


we  obtain 


Rn 


^  n  n 

Pi)  =  log  =  iog«,  ^Pij  >  o,J2pij  =  1 


i=l 


/o  ^  ft  ^ 

JRS(Pl,P2,---,Pn)  =  -^-^<^^Pi)  =  0- 

i=l  i=l 

If  a  =  0  and  =  A^,  i  =  1,  2, . . . ,  n,  denote  0°  =  0,  then  by  Proposition  3.6, 

JRaiPl^P2,---,Pn)  =  l0g«- 


This  concludes  the  proof  for  the  proposition. 
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Chapter 


Introduction  to  Multi¬ 
scale  Analysis 


IN  this  section,  we  briefly  review  the  concept  of  multiscale  analysis  [67],  We  study 
the  properties  of  the  operator  which  approximates  a  signal  at  a  given  resolution. 
We  show  that  the  difference  of  a  signal  at  different  resolutions  can  be  extracted  by  de¬ 
composing  the  signal  on  a  wavelet  orthonormal  basis.  In  L^(M),  a  wavelet  orthonormal 
basis  is  a  family  of  functions  —  n)}(j^„)e22,  which  is  built  by  dilating  and 

translating  a  unique  function  The  development  of  orthonormal  wavelet  bases  has 
opened  a  new  bridge  between  approximation  theory  and  signal  processing.  For  the 
application  of  signal  estimation  in  additive  noise  environment,  linear  operators  have 
long  predominated  because  of  their  simplicity,  despite  their  limited  performance.  It  is 
possible  to  keep  the  simplicity  while  improving  the  performance  with  non-linearities  in 
a  sparse  representation.  One  such  example  is  a  wavelet  thresholding  estimator,  which 
is  used  to  suppress  additive  noises  and  restore  signals  degraded  by  low-pass  filters. 


4.1  Multiscale  Approximation  of  L^(R) 


Adapting  the  signal  resolution  allows  one  to  process  only  the  relevant  details  for  a  par¬ 
ticular  task.  Burt  and  Adelson  [68]  introduced  a  multiresolution  pyramid  that  can  be 


used  to  process  a  low  resolution  first  and  then  selectively  increase  the  resolution  when 


47 


CHAPTER  4.  INTRODUCTION  TO  MULTISCALE  ANALYSIS 


48 


necessary.  The  approximation  of  a  function  /  at  a  resolution  2“^  is  specified  by  samples 
on  a  discrete  grid  which  provides  local  averages  of  /  over  neighborhoods  of  size  pro¬ 
portional  to  2h 


Definition  4.1  Let  {Vj},j  E  Zbea  set  of  closed  subspaces  o/L^(]R),  we  say  that  the  sequence 
e  Zis  a  multiresolution  approximation  o/L^(E),  if  the  following  conditions  hold: 

•  V(j,  n)  e  Z2,  f{t)  e  V,  ^  fit  -  2^n)  e  V2.  (4.1) 

.  VjeZ,V,+iCV,  (4.2) 

•Vj  e  Z,  fit)  E  Vj  ^  fi^)  E  V,+1  (4.3) 

+  00 

•  ,lto  Vy=  fl  V,  =  {0}  (4.4) 

j=-oo 
+  00 

•  lim  V.  =  (  I  I  Vd  =  L2(M)  (4.5) 

j=-oo 

•  There  exists  0(-)  such  that  {Of  —  n)}nei.  is  a  Riesz  basis  o/Vq 


Example:  Shannon  approximations  Frequency  band-limited  functions  yield  a  mul¬ 
tiresolution  approximations.  The  space  is  defined  as  the  set  of  functions  whose 
Fourier  transform  has  a  support  included  in  [— 2“%,  2“%].  It  can  be  shown  that  an 
orthonormal  basis  {Of  —  n)}nez  of  Vq  is  defined  by 

e(t)  =  ,4.6) 

irt 

All  the  other  properties  of  the  multiresolution  approximation  are  easily  verified. 

The  approximation  of  /  at  the  resolution  of  2“^  is  defined  as  the  orthogonal  projection 
Pvjf  on  Vj.  To  compute  this  projection,  we  must  find  an  orthonormal  basis  of  Vj.  The 
following  theorem  orthogonalizes  the  Riesz  basis  {Of  —  n)}„ez  and  constructs  an  or¬ 
thonormal  basis  of  each  subspace  Vj  by  dilating  and  translating  a  single  function  </>(•) 
called  a  scaling  function.  To  avoid  the  resolution  2~^  and  the  scale  2f  in  the  rest  of  the 
chapter  the  notation  of  resolution  is  dropped  and  Pvj  /  is  called  an  approximation  at 
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the  scale  2K 


Theorem  4.1  Let  be  a  multiresolution  approximation  and  4)  he  the  scaling  function 

whose  Fourier  transform  is 


(f){u) 


9{uj) 


Let  us  denote 

,  ,  1  ,^t  —  2^n 

The  family  {(t)j,n}n&  Is  an  orthonormal  basis  ofY  jfor  all  j  e  Z. 


(4.7) 


(4.8) 


The  orthogonal  projection  of  /  over  Vj  is  obtained  with  an  expression  in  the  scaling 
orthogonal  basis 


+  00 

j  f  —  ^  ^  /)  4’j,n  >  fj,n- 

k=—oo 


(4.9) 


The  inner  products 


«i[^]  =<  f,  4>j,n  > 


(4.10) 


provide  a  discrete  approximation  of  /  at  the  scale  2L 

For  the  case  of  Shannon  approximations,  we  have  constructed  Riesz  basis  {d{t  —  n)}n&, 
which  are  orthonormal  basis,  hence  4>{t)  =  0{t). 

A  multiresolution  approximation  is  entirely  characterized  by  the  scaling  function  f  that 
generates  an  orthonormal  basis  of  each  subspace  Y j.  We  study  the  properties  of  f 
which  guarantee  that  the  spaces  Yj  satisfy  all  the  conditions  of  a  multiresolution  ap¬ 
proximation.  It  is  proven  [67]  that  any  scaling  function  can  be  specified  by  a  discrete 
filter  called  a  conjugate  mirror  filter  [69].  The  multiresolution  causality  property  im¬ 
poses  that  Yj  C  Vj_i.  In  particular,  l/\/2(f){t/2)  G  Vi  C  Vq.  Since  {ff  —  n)}„ez  is  a 
basis  of  Vo,  we  can  decompose 


1 
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h[n](l){t-n), 


n=—oo 


(4.11) 
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with 

h[n]  =<  </'(^  -  n)  >  .  (4.12) 

Equation  (4.11)  relates  a  dilation  of  the  scaling  function  </>  by  2  to  its  integer  translations. 
The  sequence  h[n]  can  be  interpreted  as  a  discrete  filter. 

The  Fourier  transformation  of  bofh  sides  of  Equation  (4.11)  yields 

<^(2w)  =  -^h{u)4){u)  (4.13) 

for  h{ijj)  =  It  then  makes  sense  to  express  (^(w)  directly  as  a  product 

of  dilation  of  h{uj).  For  any  p  >  0,  Equation  (4.13)  implies 

<^(2-^’+^w)  =  -^h{2-Puj)^{2-Puj).  (4.14) 

V2 

If  (^(w)  is  continuous  at  w  =  0  then  limp_>+oo  (I){‘2~Puj)  =  <^(0).  By  substitution,  we  obtain 

=  (4.15) 

p=i  V  ^ 

The  following  theorem  by  Mallat  and  Meyer  [67]  gives  necessary  and  sufficient  condi¬ 
tions  on  h{uj)  to  guarantee  that  this  infinite  product  is  the  Fourier  transform  of  a  scaling 
function. 

Theorem  4.2  Let  cj)  €  L^(M)  he  an  integrable  scaling  function.  The  Fourier  transform  of 
h[n]  =<  :^^(|),  f{t  —  n)  >  satisfies 

Vw  e  M,  |/i(aj)|^ -|- |^(aj -|- 7r)|^  =  2,  (4.16) 


and 


m  =  7(2). 


(4.17) 


Conversely,  ifh{u)  is  2n  periodic  and  continuously  differentiable  in  a  neighborhood  ofuj  =  0,  if 
it  satisfies  Equation  (4.16)  and  (4.17),  then 


f{uj)  =  J]^ 

p=i 


h(2-Puj) 


is  the  Fourier  transform  of  a  scaling  function  f  e  L^(E). 


(4.18) 
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For  the  case  of  Shannon  multiresolution  approximation,  (^(w)  =  l[_jr,7r](<^)-  We  thus 
derive  from  Equation  (4.18)  that 

VW  e  [-7r,7r],  h{uj)  = 

4.2  Orthonormal  Wavelet  Basis 

in  the  multiscale  analysis,  we  are  especially  interested  in  the  difference  between  consec¬ 
utive  resolution  scales.  This  difference  is  often  called  a  detail  signal.  The  approximation 
at  resolution  and  2^  of  a  signal  are  respectively  equal  to  their  orthogonal  projections 
on  Vj_i  and  Vj.  It  can  also  be  shown  that  the  signal  details  at  resolution  2^  are  given 
by  an  orthogonal  projection  of  the  original  signal  onto  the  orthogonal  complement  of 
Vj  in  Vj_i.  Let  Oj  represent  the  orthogonal  complement  of  Vj,  we  then  have, 

where  0  denotes  a  direct  sum.  In  order  to  obtain  the  detail  signal  of  a  function  on  Oj, 
we  need  to  find  an  orthonormal  basis  of  Oj. 

Theorem  4.3  [70]  Let  (f)bea  scaling  function  and  h  the  corresponding  conjugate  mirror  filter. 
Let  xp{t)  be  function  whose  Fourier  transformation  is  defined  by 

iiu,]  =  )  (4.19) 

with 

g{u)  =  0  tt)  (4.20) 

Let  us  denote 

1  f  _  2^n 

(4-21) 

then,  for  any  scale  2f  {'fj,n}ne’i.  (in  orthonormal  basis  of  Oj.  For  all  scales, 
an  orthonormal  basis  o/L^(M). 
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The  orthogonal  projection  of  a  signal  /  G  L^  (M)  in  a  detailed  space  Oj  is  then  obtained 
with  a  partial  expansion  in  its  wavelet  basis 

+  00 


POjf  —  >  '4’j,r, 


(4.22) 


A  signal  expansion  in  a  wavelet  orthonormal  basis  can  thus  be  viewed  as  an  aggrega¬ 
tion  of  details  at  all  the  scales  2^  that  goes  from  —  cx)  to  -|-(X), 

+00  +00  +  00 


-Poj  /  —  <  /,  >  '4’j,n- 


(4.23) 


3=-oo 


j=—oo  n=—oo 


Many  applications  using  wavelet  decomposition  desire  efficient  approximations  of  par¬ 
ticular  classes  of  functions  by  a  few  non-zero  coefficients.  This  usually  requires  optimiz¬ 
ing  the  design  of  ?/)(•)  to  produce  maximum  number  of  wavelet  coefficients  <  /,  > 

that  are  close  to  zero.  The  actual  number  of  coefficients  with  non-negligible  values 
depends  on  the  regularity  of  /,  the  number  of  vanishing  moments  of  the  analyzing 
wavelet  and  the  size  of  its  support. 

A  wavelet  has  p  vanishing  moments  [70]  if 


f 


t^'tp{t)dt  =  0,  0  <  k  <  p 


(4.24) 


The  vanishing  moment  is  crucial  to  measure  the  local  regularity  of  a  signal.  If  the 
wavelet  has  p  vanishing  moments,  then  it  can  be  shown  that  the  wavelet  transforma¬ 
tion  is  actually  a  multiscale  differential  operator  of  order  p.  This  nice  property  relates 
the  differentiability  of  /  with  its  wavelet  transform  decay  at  fine  scales. 

The  order  derivative  is  the  Fourier  transform  of  Hence 

^+oo 

^W(O)  =  / 


(4.25) 

If  a  wavelet  '0(-)  has  p  vanishing  moments,  then  by  Equation  (4.24),  and  its  first 

p  —  1  derivatives  are  zero  at  w  =  0.  Theory  (4.3)  shows  that 


V2'ijj{2u)  =  e  -|- 7r)(^(w). 


(4.26) 
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Orthogonal  Wavelet  Decomposition 


0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8  0.9  1 


0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8  0.9  1 


0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8  0.9  1 


0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8  0.9  1 


0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8  0.9  1 


0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8  0.9  1 

.1:  Orthogonal  wavelet  decomposition  of  f{t)  with  Daubechies  wavelet  at  res- 
of  2^,  j  =  —  6,  —  7, — 10,  Pv-ef  is  the  remaining  coarse  signal  approximation 
:  /,  <l)j,n  >  for  J  =  -6. 
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Since  (/i(0)  ^  0,  it  follows  that  if  a  wavelet  '0(-)  has  p  vanishing  moments,  then  its  cor¬ 
responding  conjugate  mirror  filter  h{uj)  and  its  first  p  —  1  derivatives  are  zero  at  w  =  tt, 
i.e.,  we  can  decompose  h{uj)  as 


(4.27) 


where  R{uj)  is  a  function  of 

As  mentioned  before,  the  number  of  wavelet  coefficients  of  /  with  non-negligible  val¬ 
ues  depends  on  not  only  the  number  of  vanishing  moment,  but  also  the  size  of  its 


support.  Suppose  f{t)  has  a  singularity  point  att  =  to,  and  if  to  is  inside  the  support  of 


—  n),  V(j,  n)  G  Z^,  then  the  corresponding  wavelet  coefficient 


might  have  a  large  value.  If  has  a  compact  support  of  size  N,  then  at  scale  2^, 
there  must  be  N  wavelets  whose  support  includes  the  singularity  point  at  to-  To 
minimize  the  number  of  coefficients  with  non-negligible  values,  we  have  to  choose  a 
wavelet  ?/)(•)  with  a  small  support  size. 

The  support  size  of  '0(-)  are  related  to  the  support  size  of  h{-)  and  (/>(•).  Daubechies 
[29]  showed  that  the  scaling  function  (/>(•)  has  a  compact  support  if  and  only  if  h{-) 
has  a  compact  support.  Assume  the  support  of  h{-)  and  (/>(•)  are  [Ni,  A2]  and  [Ai,  K2] 
respectively,  recall  that 


the  support  size  of  4>{t/2)  should  be  [Ni  +  Ki,N2  +  A2].  On  the  other  hand,  4>{t/2)  is  just 
a  dilation  of  (f){t),  its  support  should  be  [2Ai,  2X2].  The  equality  proves  that  if  h  has  a 
compact  support  of  [Ni,  N2],  then  the  support  of  (f){t)  is  also  [Ni,  A2]. 

Since  il>{t/2)  G  Oi  C  Vo,  it  thus  can  be  decomposed  in  {(f){t  —  n)}nei./  which  is  an 
orthonormal  basis  of  Vq, 
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n 

h[n] 

n 

h[n] 

n 

h[n] 

p  =  2 

0 

.482962913145 

p  =  2 

0 

.332670552950 

p  =  4 

0 

.230377813309 

1 

.836516303738 

1 

.806891509311 

1 

.714846570553 

2 

.224143868042 

2 

.459877502118 

2 

.630880767930 

3 

-.129409522551 

3 

-.135011020010 

3 

-.027983769417 

4 

-.085441273882 

4 

-.187034811719 

5 

.035226291882 

5 

.030841381836 

6 

.032883011667 

7 

-.010597401785 

Table  4.1:  Daubechies  filter  coefficients  for  wavelets  with  p  =  2, 3, 4  vanishing  moments. 


By  equation  (4.20),  we  have 


g[n]  =  (-1)^  ”/i[l  -  n]. 


Then  we  obtain  that 


+  00 

-  n](/)(t  -  n). 

n=—oo 


The  sum  in  the  right  hand  side  has  a  support  equal  to  [Ni  —  A2  +  1,  A2  —  +  1].  Hence 

the  support  of  is  [{Ni  —  N2  +  l)/2,  {N2  —  Ai  +  l)/2]. 

The  direct  consequence  of  the  above  derivation  is  that  tradeoff  should  be  made  be¬ 
tween  the  support  size  of  a  wavelet  and  its  number  of  vanishing  moment.  One  can 
not  have  a  wavelet  of  compact  support  with  an  arbitrary  high  vanishing  moments.  To 
ensure  a  wavelet  i/j  with  p  vanishing  moments  has  a  minimal  support,  we  need  con¬ 
struct  h{uj)  as  in  Equation  (4.27)  to  have  a  minimum  degree.  The  difficulty  is  to  design 
a  polynomial  R{uj)  of  minimum  degree  such  that 


\h{co)\^  +  \h{co  +  7r)\‘^  =  2. 


(4.28) 


Since  h[n\  is  real,  |^(w)p  is  a  even  function  and  thus  can  be  written  as  a  polynomial  in 
cos  u.  Hence  |i?(w)  ^  defined  in  Equation  (4.27)  is  a  polynomial  in  cos  cu  and  we  can  also 
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P=2  p=3 


V(t).  P=2  vW.  P=3 


(|)(t),  p=4 


V(t),  p=4 


Figure  4.2:  Daubechies  scaling  function  and  wavelets 


write  it  as  a  polynomial  P(sin^  oj /2) 

|Ma;)r  =  2(cos2|rP(sin2|). 

The  quadrature  condition  (4.28)  is  equivalent  to 

{l-yyPiy)  +  y^P{l-y)  =  l,  (4.29) 

for  any  y  =  sin^(a; /2)  G  [0, 1].  To  minimize  the  nonzero  terms  of  the  finite  Fourier  series 
h{uj),  we  must  find  the  solution  P{y)  >  0  of  minimum  degree,  which  is  obtained  with 
the  Bezout  [71]  classical  theorem  on  polynomials.  The  Bezout  theorem  proves  that  there 
exist  two  unique  polynomials  Pi{y)  and  P2{y)  such  that 

{l-yypyy)  +  yPP2iy)  =  l. 


It  can  be  verified  that  P{y)  =  Pi{y)  =  P2(l  —  y)  with 

p  —  1  +  k 


p-i 

p(v)  =  E 

k=0 


k 
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Daubeohies  i|)(t),  p=8 


Daubeohies  p=8 


Symmlet  p=8 


Figure  4.3:  Symmlet  scaling  functions  and  wavelets 

Clearly  P{y)  >  0  for  y  =  sin^(t(j/2)  G  [0, 1].  Hence  P{y)  is  the  polynomial  of  the  mini¬ 
mum  degree  satisfying  the  Equation  (4.29). 

Now  we  need  to  construct  a  minimum  degree  polynomial 

m  m 

i?(e-)  =  ^  =  ro  n(l  “ 

k=0  k=0 

such  that  |i?(e*“)p  =  P(sin^  tj/2).  Let  2;  =  we  obtain 

R{z)R{z~^)  =  rl  ]^(1  -  akz){l  -  akZ~^)  =  P{ - ^  )  =  Q{z)  (4.30) 

k=0 

Since  Q{z)  =  R{z)R{z~^)  has  real  coefficients,  if  Ck  is  a  root  of  Q{z),  then  4,  1/ck 
and  1/c^  are  roots  of  Q{z)  as  well.  To  design  R{z)  satisfies  Equation  (4.30),  we  choose 
each  root  ak  of  R{z)  among  a  pair  (cjt,  1/ck)  and  include  a],  to  obtain  real  coefficients. 
This  procedure  yields  a  polynomial  of  minimum  degree  m  =  p  —  1.  the  resulting  fil¬ 
ter  h  of  minimum  size  has  N  =  p-\-m-\-l  =  2p  nonzero  coefficients.  Among  all  the 
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possible  factorizations,  the  minimum  phase  solution  is  obtained  by  choosing 

among  1/c/;)  to  be  inside  the  unit  circle  \ak\  <  1.  The  resulting  causal  filter  h  is  a 
Daubechies  filter  of  order  p.  Table  (4.1)  gives  the  coefficients  of  these  Daubechies  filters 
for  p  =  2,3,  4. 

If  -0  is  a  wavelet  with  p  vanishing  moments  that  generates  an  orthonormal  basis  of 
L^(E),  then  it  has  a  support  size  larger  or  equal  to  2p  —  1.  A  Daubechies  wavelet  [29]  cal¬ 
culated  with  Daubechies  filter  banks  has  a  minimum  size  of  support  equal  to  [—p+l,p\. 
The  support  of  the  corresponding  scaling  functions  0  is  [0,  2p  —  1], 

As  a  example.  Figure  (4.2)  shows  Daubechies  scaling  functions  0  and  wavelet  0  at  scale 
j  =  3  with  p  =  2,  3,  4  vanishing  moments  respectively. 

Daubechies  wavelets  are  very  asymmetric  because  they  are  constructed  by  selecting  the 
minimum  phase  square  root  of  Q{z)  in  Equation  (4.30).  One  can  show  that  filters  cor¬ 
responding  to  a  minimum  phase  square  root  have  their  energy  optimally  concentrated 
near  the  starting  point  of  their  support.  They  are  thus  highly  non-symmetric,  which 
yields  very  asymmetric  wavelets. 

To  obtain  symmetric  or  antisymmetric  wavelet,  the  filter  h  must  be  symmetric  or  anti¬ 
symmetric  with  respect  to  the  center  of  its  support,  which  means  that  h{u)  has  a  linear 
phase.  Haar  filter  is  the  only  real  compactly  supported  conjugate  mirror  filter  that  has 
a  linear  phase.  The  Symmlet  filters  of  Daubechies  are  obtained  by  optimizing  the  choice 
of  the  square  root  of  Q{z)  to  obtain  an  almost  linear  phase.  The  resulting  wavelet  still 
have  a  minimum  support  [—p  4- 1,  p]  with  p  vanishing  moments  but  they  are  more  sym¬ 
metric,  as  illustrated  in  Figure(4.3). 


4.3  Fast  Orthogonal  Wavelet  Transform 

A  fast  wavelet  transform  decomposes  successively  each  approximation  Py.  into  a  coarser 
resolution  Pvj+i  plus  the  wavelet  coefficients  carried  by  Poj+i  ■  In  the  reverse  direction, 
the  reconstruction  from  wavelet  coefficients  recovers  each  Py.  from  Pvj+i  and  Poj+i  ■ 
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Since  {4>j,n}nei.  {^j,n}nez  are  orthonormal  basis  of  Vj  and  Oj,  the  projection  of 
/  G  L^  (M)  on  V  j  is  characterized  by 

L  +00  +00 

^v./=  E  E  dAn]i’j,n+  E  VL  >  J  (4.31) 

j=J+l  n=—oo  n=—QO 

where 

=<  f,  4>j,n  >,  (j,  n)  e  lA  (4.32) 

and 

dj[n]  =<  f,'ipj,n  >,  {j,n)  e  (4.33) 

The  following  theorem  [67]  shows  that  these  coefficients  are  calculated  with  a  cascade 
of  discrete  convolution  and  subsampling.  We  denote  x[n\  =  x[—n\  and  a  zero  interpo¬ 
lation  of  x[n\  as 

{x\k],  n  =  2k 
0,  n  =  2k  +  I 

Theorem  4.4  Let  (f)  be  a  scaling  function  and  {h,g}  are  its  corresponding  conjugate  mirror 
filters.  At  the  decomposition, 

+  00 

aj-^-i[k]  =  h[n  —  2k]aj[n]  =  Oj -k  h[2k]  (4.35) 

n=—oo 

+  00 

dj^i[k]  =  g[n  —  2k]aj[n\  =  Oj -k  A‘^k]  (4.36) 

n=—oo 

At  the  reconstruction, 

+  00  +00 

“  2n]aj+i[n]  -|-  g[k  —  2n](ij+i[n]  =  Sj+i  -k  h[k]  +  dj+i  -k g[k].  (4.37) 

n=— 00  n=—oo 

Theory  (4.4)  proves  that  a^+i  and  dj+i  are  computed  by  taking  every  other  sample  of 
the  convolution  of  aj  with  h  and  g  respectively,  as  illustrated  by  Figure  (4.4a).  The  fil¬ 
ter  h  removes  the  higher  frequencies  of  the  inner  product  sequence  aj  whereas  ^  is  a 
high-pass  filter  which  collects  the  remaining  highest  frequencies.  The  reconstruction  is 
an  interpolation  that  inserts  zeros  to  expand  a^+i  and  dj+i  and  filter  these  signals,  as 
shown  in  Figure  (4.4b). 
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aj+2 


d 


j+2 


(a) 


aj+2 


d 


j+2 


(b) 

Figure  4.4:  (a)  A  fast  wavelet  transform  is  computed  with  a  cascade  of  filters  with  h  and 
g  followed  by  a  factor  2  subsampling,  (b)  A  fast  inverse  wavelet  transform  reconstructs 
progressively  each  aj  by  inserting  zeros  between  samples  of  a^+i  and  dj+i,  filtering  and 
adding  the  outputs. 

The  decomposition  of  a  discrete  signal  in  conjugate  mirror  filters  {h,g}  can  be  inter¬ 
preted  as  an  expansion  in  a  basis  of  I^(Z),  the  resulting  family  {h[n  —  2l],g[n  —  is 

an  orthogonal  basis  of  I^(Z).  An  orthogonal  wavelet  representation  of  aj  =<  /,  (f)j^n  > 
is  composed  of  wavelet  coefficients  of  /  at  scales  2-^  <  2^  <  2^  plus  the  remaining 
approximation  at  the  largest  scale  2^, 

[  {(^j}j<j<L,  (iL  ] 

It  is  computed  by  iterating  Equation  (4.35)  and  (4.36)  for  J  <J  <L.  Figure  (4.1)  gives 
a  numerical  example  computed  with  the  Daubechies  filter  of  order  p  =  4.  The  original 
signal  aj  is  recovered  from  this  wavelet  representation  by  iterating  the  reconstruction 
Equation  (4.37)  for  J  <  j  <  L. 

The  conjugate  mirror  filters  are  often  used  in  filter  banks  that  cascade  several  levels  of 
filtering  and  subsampling.  It  is  thus  necessary  to  understand  the  behavior  of  such  a 
cascade  [72].  In  a  wavelet  filter  bank  tree,  the  output  of  the  low-pass  filter  h  is  sub- 
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decomposed  whereas  the  output  of  the  high-pass  filter  g  is  not.  This  is  illustrated  in 
Figure  (4.4).  Suppose  that  the  sampling  distance  of  the  original  signal  is  N~^.  We 
denote  aj[n]  this  discrete  signal,  with  2'’  =  N~^.  At  the  depth  J  >  J  of  this  filter  bank 
tree,  the  low-pass  signal  aj  and  high-pass  signal  dj  can  be  written  as 

aj[n\  =  aj  -k  ^j[2^~'^n\  =<  aj[k],  (f)j[k  —  2^~'^n\  > 


and 

dj[n\  =  aj  -k  ipj[2^~'^n\  =<  aj[k],ipj[k  —  2^~'^n\  >  . 

The  Fourier  transform  of  these  equivalent  filters  are 

j-j-i 

d(w)  =  Y[  h{2Pu)  (4.38) 

p=0 

and 

j-J-2 

^{u)  =  g{2^-^-^u)  J]^  h{2Pu).  (4.39) 

p=0 

A  filter  bank  tree  of  depth  L  —  J  >  0  decomposes  aj  over  the  family  of  vectors 

[  {Mk  -  Mk  -  2^-'n]}j<,<L,nGZ  ]  •  (4.40) 


For  conjugate  mirror  filters,  one  can  verify  that  this  family  is  an  orthonormal  basis  of 
I^(Z).  These  discrete  vectors  are  close  to  a  uniform  sampling  of  the  continuous  time 
scaling  functions  (f)j{t)  =  2~^/‘^(f>(2~H)  and  wavelets  =  2~^/‘^ip{2~H).  When  the 
number  L  —  J  oi  successive  convolutions  increases,  one  can  verify  that  (j)j  [n]  and  'ipj  [n] 
converges  respectively  to  and  We  therefore  refer  the 

Equation  (4.40)  to  as  a  Discrete  Wavelet  Basis  of  I^(Z). 


4.4  Filter  Banks  and  Biorthogonal  Wavelets 

The  fast  discrete  wavelet  transform  decomposes  signal  into  low-pass  and  high-pass 
components  subsampled  by  2,  the  inverse  transform  performs  the  reconstruction.  Study 
of  such  classical  multirate  filter  banks  became  a  major  signal  processing  topic  in  1976, 
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Figure  4.5:  A  two  channel  perfect  reconstruction  filter  banks 


when  Croisier,  Esteban  and  Galand  [73]  discovered  that  it  is  possible  to  perform  such 
decompositions  and  reconstructions  with  quadrature  mirror  filters.  However,  besides  the 
simple  Haar  filter,  a  quadrature  mirror  filter  can  not  have  a  finite  impulse  response. 
In  1984,  Smith  and  Barnwell  [74]  and  Mintzer  [75]  found  necessary  and  sufficient  con¬ 
ditions  for  obtaining  perfect  reconstruction  orthogonal  filters  with  a  finite  impulse  re¬ 
sponse,  that  they  called  conjugate  mirror  filters.  The  theory  was  completed  by  the  biorthog- 
onal  mirror  filter  equations  of  Vetterli  [76,  77].  We  follow  this  digital  signal  processing 
approach  which  gives  a  simple  understanding  of  perfect  reconstruction  filter  banks. 


4.4.1  Perfect  Reconstruction  Filter  Banks 

A  two-chaimel  multirate  filter  bank  convolves  a  signal  oq  with  a  low-pass  filter  h[n]  = 
h[n\  and  a  high-pass  filter  g[n\  =  g[n\  and  subsamples  the  output  by  2, 

ai[n\  =  ao -k  h[2n\  and  di[n\  =  Oq -k  g[2n\.  (4.41) 

A  reconstructed  signal  do  is  obtained  by  filtering  the  zero  interpolated  signals  with  a 
dual  low-pass  filter  h  and  a  dual  high-pass  filter  g,  as  shown  in  Figure  (4.5).  With  the 
zero  insertion  notation  (4.34),  it  yields 

SqM  =  di  k  h[n] di  k  g[n].  (4.42) 

The  following  theorem  by  Vetterli  [77]  gives  biorthogonal  conditions,  which  guarantee 
that  ciQ  =  do- 
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Theorem  4.5  The  filter  bank  performs  an  exact  reconstruction  for  any  input  signal  if  and  only 

if 

h*{uj  +  7r)h{uj)  g*{uj  +  7r)g{uj)  =  0,  (4.43) 


and 

h*{uj)h{uj)  g*{uj)g{uj)  =  2. 


(4.44) 


Theory  (4.5)  proves  that  the  reconstruction  filters  h  and  g  are  entirely  specified  by  the 
decomposition  filters  h  and  g.  In  the  matrix  form,  the  biorthogonal  condition  can  be 
written  as. 


- 1 

_ J 

h*{uj) 

2 

X 

= 

h{uj  +  tt)  g{u}  +  tt) 

^*(w) 

0 

For  the  finite  impulse  response  filters,  there  exist  a  e  M  and  I  E  Z  such  that 


g{uj)  =  ae  +  n)  and  g{uj)  =  a  h* {uj  +  n) . 


The  factor  a  is  a  gain  which  is  inverse  for  the  decomposition  and  reconstruction  filters 
and  I  is  a  reverse  shift.  We  generally  set  a  =  1  and  /  =  0.  The  internal  relationship 
between  biorthogonal  filter  banks  become 

g{u)  =  +  tt)  and  g{u)  =  +  tt).  (4.45) 

In  the  time  domain.  Equation  (4.45)  can  be  written  as 

g[n]  =  (— 1)^“”^[1  —  n]  and  g[n]  =  (— —  n].  (4.46) 

The  two  pairs  of  filters  {h,  g}  and  {h,  g}  plays  a  symmetric  role  and  can  be  inverted. 
The  biorthogonal  condition  simplifies  to 

h*{uj)h{uj)  +  h*(uj  +  7r)h{uj  +  tt)  =  2.  (4.47) 

If  we  impose  that  the  decomposition  filter  h  is  equal  to  the  reconstruction  filter  h,  then 
Equation  (4.47)  is  exactly  the  condition  of  conjugate  mirror  filters  given  in  Equation 
(4.28). 
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4.4.2  Biorthogonal  Wavelets 


An  infinite  cascade  of  perfect  reconstruction  filters  {h,  g}  and  [h,  g}  yields  two  scaling 
functions  and  wavelets  whose  Fourier  transforms  satisfy 


=  Y[ 

p=i 


h{2-Puj) 


+  00 


and  (f){uj)  = 

p=i 


h{2-Pu) 


(4.48) 


and 

One  [71,  78]  can  proves,  with  some  additional  conditions,  (f>  and  ^  are  the  Fourier  trans¬ 
forms  of  finite  energy  functions.  The  two  wavelet  families  {ipj,n}{j,n)e^‘^  {'ipj,n}{j,n)ez^ 
are  biorthogonal  Riesz  bases  of  L^  (M) .  Biorthogonality  means  that  for  any  ( ji ,  j2 ,  ^ 


<  ^i2,n2  >=  -  h,  ni-  712) 


(4.50) 


Any  /  G  L^(]R)  has  two  possible  decompositions  in  these  bases, 

+00  +  00  +00  +  00 

/  =  ^  /)  ^  '^j,n  =  ^  /)  ‘^j,n  >  '0j,n- 

j=—oo  n=—oo  j=—oo  n=—oo 

The  Riesz  stability  implies  that  there  exist  A  >  0  and  B  >  A  such  that 

+  00  +00 

j=—oo  n=—oo 


(4.51) 


and 

+00  +00 

bIi/ip<  E  E  i</.V’i,«>r"<^ii/r 

j=—oo  n=—oo 

Biorthogonal  wavelet  bases  are  related  to  multiresolution  approximations.  The  family 
{(f){t—n)}nei.  is  a  Riesz  basis  of  the  space  Vo  it  generates,  whereas  {^{t—n)}nei.  is  a  Riesz 
basis  of  another  space  Vq.  Let  Vj  and  be  the  spaces  spaimed  by  {</>j,n(^)}nez  and 
{^i,n(^)}nez  respectively.  One  can  verify  that  {V jjnez  and  {V j}nez  are  two  multireso¬ 
lution  approximations  of  L^(M).  For  any  j  e  Z,  the  dilated  {'il>j,n{t)}nez  and  {tpj,n{t)}nez 
are  bases  of  two  detail  spaces  {Oj}nez  and  {Oj}nez  such  that 


Vj  ©  Oj  —  Vj-i 
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and 

V,  ©  6,  =  v,_i. 

The  biorthogonality  of  the  decomposition  and  reconstruction  wavelets  implies  that  Oj 
is  not  orthogonal  to  Vj ,  but  is  to  Vj,  whereas  Oj  is  not  orthogonal  to  Vj ,  but  is  to  Vj. 
The  support  size,  the  number  of  vanishing  moments,  the  regularity  and  the  symmetry 
of  biorthogonal  wavelets  is  controlled  with  an  appropriate  design  of  h  and  h. 

Similar  to  the  case  of  orthogonal  wavelets,  one  can  show  that  if  h[n]  and  h[n]  are  nonzero 
respectively  for  Ni  <  n  <  N2  and  Ni  <  n  <  N2,  then  (j)  and  (j)  have  a  support  equal  to 
[Ni,  N2]  and  [Ni,  N2]  respectively  By  Equation  (4.46),  notice  that 

g[n]  =  —  n],  and  g[n]  =  (— —  n] 

the  support  of  'ip  and  ip  are  respectively  [{Ni  —  N2  +  l)/2,  {N2  —  iVi  +  1) /2]  and  [{Ni  — 
N2  +  l)/2,  {N2  —  Ni  -\- 1)/2].  Both  wavelets  thus  have  the  same  size  of  support,  which 
is  equal  to  {N2  —  Ni  +  N2  —  Ni)/2. 

Since  (^(0)  ^  0  and  (p(0)  ^  0,  Equation  (4.25)  and  (4.49)  show  that  the  number  of  van¬ 
ishing  moments  of  pj  and  p)  depends  on  the  number  of  zeros  at  a;  =  0  of  ^(w)  and  g{uj) 
respectively.  Notice  g{uj)  =  {uj  +  tt)  and  g{uj)  =  +  tt)  for 

some  a  e  M  and  I  E  Z,  we  conclude  that  tp  has  p  vanishing  moments  if  and  only  iih{uj) 
has  a  zero  of  order  p  at  co  =  n,  whereas,  pj  has  p  vanishing  moments  if  and  only  it  h{uj) 
has  a  zero  of  order  p  at  u  =  tt.  On  the  other  hand,  the  smoothness  of  (p  and  pj  can  be 
related  to  the  order  of  zeros  ot  h{uj)  at  uj  =  tt  [70].  This  is  intuitively  make  sense  by 
Equation  (4.48)  and  (4.49),  the  more  number  of  zeros  ot  h{uj)  at  uj  =  n,  the  smoother  (p 
and  pj  will  be.  The  following  remark  summarizes  the  relationship  between  the  number 
of  vanishing  moments  and  the  regularity  of  biorthogonal  wavelets. 

Remark:  Let  {h,  g}  and  {h,  g}  he  perfect  reconstruction  filter  banks,  and  {(p,  pj}  and  {f,  pj} 
are  biorthogonal  scaling  functions  and  wavelets  generated  from  {h,g}  and  {h,g}.  Let  the  or¬ 
der  of  zeros  of  h{u)  and  h{u)  at  u  =  n  be  p  and  p  respectively,  then  the  regularity  of  cp  and 
pj  increases  with  p,  which  is  the  vanishing  moments  of  ip,  similarly,  the  regularity  of  ip  and  ip 
increases  with  p,  which  is  the  vanishing  moments  ofpj. 
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n 

P,P 

h[n\ 

h[n\ 

0 

V2/2 

45a/2/64 

-1,1 

p  =  2 

a/2/4 

19a/2/64 

-2,2 

p  =  2 

-a/2/8 

-3,3 

-3a/2/64 

-4,4 

3a/2/128 

0,1 

3^2/8 

11025^2/16384 

-1,2 

p  =  2 

V2/8 

-307a/2/16384 

-2,3 

p  =  2 

-3489a/2/16384 

-3,4 

363a/2/16384 

-4,5 

865a/2/16384 

-5,6 

-195a/2/16384 

-6,7 

-105V2/16384 

-7,8 

35^2/16384 

Table  4.2:  Perfect  reconstruction  filters  h  and  h  for  compactly  supported  spline 
biorthogonal  wavelets  with  p  and  p  vanishing  moments. 


Since  ip  and  ip  may  not  have  the  the  same  regularity  and  number  of  vanishing  moments, 
the  two  reconstruction  formulas 

+  00  +00 

j=—oo  n=—oo 

and 

+  00  +00 

/  =  <  /;  '’Pjy-n  >  '4’j,n 

j=—oo  n=—oo 

are  not  equivalent.  To  produce  small  wavelet  coefficients  in  the  regular  regions,  we 
must  compute  the  inner  products  using  the  wavelet  with  the  maximum  number  of  van¬ 
ishing  moments.  The  reconstructions  is  then  performed  with  the  other  wavelet,  which 
is  generally  the  smoother  one. 

It  is  possible  to  construct  smooth  biorthogonal  wavelets  of  compact  support  which  are 
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(p  =  2,p  =  4) 


(p  =  2,p  =  4)  (p  =  3,p  =  7)  (p  =  3,p  =  7) 
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Figure  4.6:  Spline  biorthogonal  scaling  functions  and  wavelets  for  (p  =  2,p  =  4)  and 
(p  =  3,  p  =  7)  at  scale  j  =  3. 


either  symmetric  or  antisymmetric.  This  is  impossible  for  orthogonal  wavelets,  except 
the  particular  case  of  the  Haar  basis.  Symmetric  or  antisymmetric  wavelets  are  syn¬ 
thesized  with  perfect  reconstruction  filters  having  a  linear  phase.  This  is  a  desirable 
property  for  many  applications. 

The  biorthogonal  wavelets  with  a  minimum  size  of  support  are  constructed  with  a 
technique  introduced  in  [78],  which  is  similar  to  the  construction  of  Daubechies  wavelets. 
As  an  example.  Table  (4.2)  gives  the  coefficients  of  perfect  reconstruction  filters  of  com¬ 
pactly  supported  spline  wavelets  for  (p  =  2,p  =  4)  and  (p  =  3,p  =  7).  The  resulting 
symmetric  and  antisymmetric  biorthogonal  wavelets  and  scaling  functions  are  shown 
in  Figure  (4.6). 
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4.4.3  Lifting  Wavelets 

A  lifting  is  an  elementary  modification  of  perfect  reconstruction  filters,  which  is  used 
to  improve  the  wavelet  properties.  Compactly  supported  biorthogonal  wavelet  bases 
can  be  constructed  from  finite  impulse  response  biorthogonal  filters  {h,g,h,g}  which 
satisfy 


h*{uj)h{uj)  +  h*{ijj  +  Tr)h{uj  +  tt)  =  2 


and 


g{uj)  =  e  +  and  g{uj)  =  e  +  tt). 

The  filter  h  and  h  are  said  to  be  dual.  The  following  theorem  [79]  characterizes  all  filters 
of  compact  support  that  are  dual  to  h. 

Theorem  4.6  Let  h  and  h  be  dual  filters  with  a  finite  support.  A  filter  with  finite  support  is 

dual  to  h  if  and  only  if  there  exists  a  finite  filter  I  such  that 


hfu)  =  h{u)  +  e  +  7r)r(2aj). 


(4.52) 


This  theory  proves  that  if  {h,  g,  h,  g}  are  biorthogonal  then  we  can  construct  a  new  set 
of  biorthogonal  filters  {h\  g,  h,  g'^}  with 


hfu)  =  h{u)  +  g{u)r{2u) 


and 


g\uj)  =  e  +  tt)  =  —  h{uj)l{2u). 

The  inverse  Fourier  transform  of  Equation  (4.53)  and  (4.53)  gives 


(4.53) 


(4.54) 


+  00 


h!‘[n\  =  h[n\  +  g[n  —  2k]l[—k] 


k=—oo 


and 


g\n\  =  g[n]  -  h[n  -  2k]l[k]. 


(4.55) 


(4.56) 


k=—oc 


The  new  filters  are  said  to  be  lifted  because  the  use  of  I  can  improve  their  properties.  A 
new  set  of  biorthogonal  wavelet  bases  can  be  derived  from  the  lifted  filter  banks  [80]. 
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aj[n] 


[n] 


dj+1  [n] 


(a) 


aj+i 


[n] 


(b) 


Figure  4.7:  The  biorthogonal  filter  banks  with  a  lifting  and  a  dual  lifting. 


Let  {d,  i>,  'tp}  be  a  family  of  compactly  supported  biorthogonal  scaling  functions  and 
wavelets  associated  with  the  filter  bank  {h,  g,  h,  g}.  Let  I  be  a  finite  sequence.  A  new 
family  of  biorthogonal  scaling  functions  and  wavelets  is  defined  by 

+  00 

=  y/2  {h[k]<j)^{2t  -k)  +  l[-k]ij^{t  -  k)) 

k=—oo 
+  00 

xp^{t)  =  y/2  Y  9[kW{‘^t  -  k) 

k=—oo 

+  00 

xp^{t)  =  p){t)  -  Y  -  *)•  (4-57) 

k=—oo 

If  '0^}  defined  in  Equation  (4.57)  have  finite  energy,  then  {05,„}(j»ez  and  {05,„}(j,n)ez 

are  biorthogonal  wavelet  bases  of  L^(Z). 

The  lifting  increases  the  support  size  of  0  and  ip  typically  by  the  length  of  the  support 
of  1.  Design  procedures  compute  minimum  size  filters  I  to  achieve  specific  properties. 
Section  (4.4.2)  points  out  that  the  regularity  of  (p  and  ip  and  the  number  of  vanishing 
moments  of  0  is  determined  by  the  order  of  zeros  oi  g{uj)  at  a;  =  0.  The  coefficients  of 
l[n\  are  often  calculated  to  produce  a  lifted  transfer  function  with  more  zeros  at 
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a;  =  0. 

To  increase  the  number  of  vanishing  moments  of  'ip  and  the  regularity  of  ^  and  pj,  we 
use  a  dual  lifting  which  modifies  h  and  hence  g  instead  of  h  and  g.  The  corresponding 
lifting  formula  with  a  filter  L[k]  are  obtained  by  switching  h  with  g  and  h  with  g  in 
Equation  (4.55)  and  (4.56) 

+  00 

g^[n]  =  y[n]  +  E  h[n  -  2k]L[-k]  (4.58) 

k=—oo 

and 

+  00 

h^[n]  =  h[n]  —  g[n  —  2k]L[k].  (4.59) 

k=—oo 

The  resulting  family  of  biorthogonal  scaling  functions  and  wavelets  {(/),  -0^,  can 

be  constructed  in  the  similar  way  in  Equation  (4.57). 

Successive  iteration  of  lifting  and  dual  lifting  can  improve  the  regularity  and  vanishing 
moments  of  both  0  and  0  by  increasing  the  order  of  zeros  of  g{uj)  and  g{uj)  at  w  =  0.  A 
block  diagram  of  biorthogonal  filter  banks  with  a  lifting  and  a  dual  lifting  is  shown  in 
Figure  (4.7). 

Example:  Lazy  Wavelets  Lazy  filters  h[n\  =  h[n\  =  5[n\  and  g[n\  =  g[n\  =  S[n  —  1]  satisfy 
the  biorthogonality  conditions  (4.45).  Their  Fourier  transforms  are 

h{uj)  =  h{uj)  =  1  and  ^(w)  =  g{uj)  =  (4.60) 

The  resulting  filter  bank  just  separate  the  even  and  odd  samples  of  a  signal  without 
filtering.  The  lazy  scaling  functions  and  wavelets  associated  with  these  filters  are 

0(0  =  0(0  =  5{t)  and  0(0  =  0(0  =  -  ^). 

Apparently  they  do  not  belong  to  L^(E).  These  wavelets  can  be  transformed  into  finite 
energy  functions  by  appropriate  liftings. 

A  lifting  of  a  lazy  filter  g(u)  =  yields 

g^(co)  =  —  l(2u). 
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To  produce  a  symmetric  wavelet  e*‘^/(2aj)  must  be  even.  It  can  be  verified  [81]  that  the 
shortest  I  that  lifts  lazy  wavelet  to  have  4  vanishing  moments  is  defined  by 


i{2uj)  =  e 

Insert  it  in  Equation  (4.53)  gives 

1 


9  1 

—luj  /  ^  O 

*  -  COS  U - cos  6UJ 

8  8 


— 3iaj 


9  9  1 

+  — e-“  +  1  +  — - 

16  16  16 


The  resulting  (f)^  is  the  Deslauriers-Dubuc  [81]  interpolating  scaling  function  of  order  4, 
and  —  1).  Both  cf)^  and  ip’'  are  continuously  differentiable  but  cp  and  ip’ 

are  still  sums  of  Diracs.  A  dual  lifting  can  transform  them  into  finite  energy  functions 
by  creating  a  dual  lifted  filter  with  one  or  more  zeros  at  w  =  0. 

Any  biorthogonal  filters  {h,  g,  h,  g}  can  be  synthesized  with  a  succession  of  lifting  and 
dual  lifting  applied  to  the  lazy  filters  defined  in  Equation  (4.60),  up  to  shifting  and  mul¬ 
tiplicative  constants. 


4.5  Separable  Wavelet  Bases 

To  any  orthonormal  wavelet  basis  {'0j,n}(i,n)ez2  L^(]R),  one  can  associate  a  separable 

orthonormal  basis  of  L^(M^), 

,ni  (^2)}( 

The  functions  ipj^^m  {ti)tpj2,n2  (h)  rriix  information  at  two  different  scales  2^^  and  2^^  along 
ti  and  t2,  which  we  often  want  to  avoid.  Separable  multiresolution  leads  to  another  con¬ 
struction  of  separable  wavelet  bases  whose  elements  are  products  of  one  dimensional 
scaling  functions  and  wavelets  dilated  at  the  same  scale.  These  multiresolution  approx¬ 
imations  also  have  important  applications  in  computer  vision,  where  they  are  used  to 
process  images  at  different  level  of  details.  Lower  resolution  images  are  represented  by 
fewer  pixels  and  might  still  carry  enough  information  to  perform  a  recognition  tasks. 
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Let{Vj},j  e  Z  be  a  multiresolution  approximation  of  L^(M).  A  separable  two-dimensional 
multiresolution  is  composed  of  the  tensor  product  spaces 

V|  =  V,0V,. 

Theory  (4.1)  shows  the  existence  of  a  scaling  function  (j)  such  that  {(t>j,n}n&  is  an  or¬ 
thonormal  basis  of  Y j.  By  the  classical  theory  of  functional  analysis,  one  can  proves 
that  for  t  =  (^1,^2)  and  n  =  (ni,  712) 

is  an  orthonormal  basis  of  V|.  It  is  obtained  by  scaling  the  separable  scaling  function 
4>^{t)  =  (f){ti)(f)(t2)  and  translating  it  onto  a  two  dimensional  grid  with  intervals  of  2^. 

Let  be  the  detail  space  equal  to  the  orthogonal  complement  of  the  lower  resolution 
approximation  space  in  i.e., 

©  Wl 

To  construct  a  wavelet  orthonormal  basis  of  L^(M^),  the  following  theory  [70]  builds  a 
wavelet  basis  of  each  detail  space  W|. 

Theorem  4.7  Let  cf)  be  a  scaling  function  and  be  the  corresponding  wavelet  generating  a 
wavelet  orthonormal  basis  o/L^(M).  VSfe  define  three  wavelets: 

f^{t)  =  f^{t)  =  ^{ti)(f){t2),  f^it)  =  f{ti)f{t2),  (4.61) 


and  denote  for  1  <  k  <3, 


}_,k  ( ^1  -  2%i  t2  -  2^n2\ 

V  ’  2^  J  ' 


The  wavelet  family 


{'0j,n5 


is  an  orthonormal  basis  0/ W|  and 


(4.62) 


is  an  orthonormal  basis  o/L^(E^). 
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(a)  An  image  of  toolbox 


(b)  2D  wavelet  decomposition  of  toolbox 


Figure  4.8:  Wavelet  decomposition  of  a  toolbox  image 

The  three  wavelets  extract  image  details  at  different  scales  and  orientations.  Over  pos¬ 
itive  frequencies,  (f>  and  have  an  energy  mainly  concentrated  respectively  on  lower 
and  higher  frequencies.  Let  ca  =  {uji,uj2),  the  separable  wavelet  expressions  implies 
that 

=  <^(wi)'0(w2),  (4.63) 

Hence  |v^^(<a)|  is  larger  at  low  horizontal  frequencies  uji  and  high  vertical  frequencies 
UJ2,  is  larger  at  high  horizontal  frequencies  uji  and  low  vertical  frequencies  102, 

whereas  |  |  is  larger  at  high  horizontal  frequencies  uji  and  high  vertical  frequencies 

002-  As  a  result,  wavelet  coefficients  calculated  with  'tp^  and  -0^  are  larger  along  edges 
which  are  respectively  horizontal  and  vertical,  and  produces  large  coefficients  at  the 
corners.  This  is  illustrated  by  the  decomposition  of  a  toolbox  image  in  Figure  (4.8). 

In  the  similar  fashion,  one-dimensional  biorthogonal  wavelet  bases  can  also  be  ex¬ 
tended  to  separable  biorthogonal  bases  of  L^(M^).  let  0,  0  and  0, 0  be  two  dual  pairs  of 
scaling  functions  and  wavelets  that  generate  biorthogonal  wavelet  bases  of  L^(E).  The 
dual  wavelets  of  0^,  0^  and  0^  defined  by  Equation  (4.61)  are 

0^(f)  =  0(fi)0(t2),  0^(t)  =  0(fl)0(f2),  0^(t)  =  0(tl)0(f2)-  (4.64) 

It  is  easy  to  verify  that 

{''/’in)  ''/’in)  ''/’in}(i,n)ez3 
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and 

are  biorthogonal  bases  of  L^(E^). 

It  is  possible  to  extend  the  fast  one-dimensional  wavelet  transform  algorithm  to  two 
dimensions.  Let  /  €  L^(M^),  at  all  the  scales  of  2^  and  for  any  n  =  (ni,  712),  we  denote 

aj[n]  =<  /,  (Pin  > 

and 

d^^[n]=<f,7Pln>,  k  =  1,2,3. 

For  any  pair  of  one-dimensional  filters  y[n\  and  z[n\,  we  write  the  product  filter  yz[n\  = 
y[ni]z[n2],  and  denote  y[n\  =  y[—n\. 

The  wavelet  coefficients  at  the  scale  2^+^  are  calculated  from  aj  with  two  dimensional 
separable  convolutions  and  subsamplings.  Let  h  and  g  be  the  conjugate  mirror  fil¬ 
ters  associated  to  the  wavelet  ip.  The  decomposition  formula  are  obtained  by  applying 
the  one-dimensional  convolutional  formula  to  the  separable  two-dimensional  wavelets 


and  scaling  functions.  For  n  =  (ni,  112), 

aj+i[n]  =  aj-khh[n],  (4.65) 

=  aj-khg[n],  (4.66) 

=  aji^gh[n],  (4.67) 

d^j+i[n]  =  aj-kgg[n].  (4.68) 


A  separable  two  dimensional  convolution  can  be  factored  into  one-dimensional  con¬ 
volutions  along  with  rows  and  columns  of  the  images.  The  factorization  is  illustrated 
in  Figure  (4.9a).  The  rows  of  aj  are  first  convolved  with  h  and  g,  and  subsampled  by  2. 
The  columns  of  these  two  output  images  are  then  convolved  respectively  with  h  and  g 
and  subsampled,  which  gives  four  subsampled  images  a^+i,  d^_^_i,  d‘j_^_-^  and 
We  denote  y[n\  =  y[ni,  712]  the  image  obtained  by  inserting  a  row  of  zeros  and  a  column 
of  zeros  between  pairs  of  consecutive  rows  and  columns  of  y[nl,  n2].  aj  is  recovered 
from  the  coarser  scale  approximation  a^+i  and  the  wavelet  coefficients  dj+i,  d|_|_i  and 
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Rows 


Columns 


aj+i 


(a) 


Columns 


Rows 


aj+i 


(b) 


Figure  4.9:  (a)  A  two-dimensional  fast  wavelet  transform  is  computed  with  a  cascade 
of  filters  h  and  g  followed  by  a  factor  2  subsampling  in  rows  and  columns  respectively, 
(b)  A  two-dimensional  fast  inverse  wavelet  transform  reconstructs  progressively  each 
ttj  by  inserting  zeros  between  samples  of  a^+i  and  k  =  1,  2, 3,  filtering  and  adding 
the  outputs  along  with  rows  and  columns. 
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with  two-dimensional  separable  convolutions  derived  from  the  one-dimensional 
reconstruction  formula  in  Equation  (4.37) 

aj[n]  =  Sj+i  *  hh[n\  -\-  -k  hg[n]  -\-  -k  gh[n]  -k  gg[n].  (4.69) 

These  four  convolutions  can  also  be  factored  into  six  groups  of  one-dimensional  con¬ 
volutions  along  rows  and  columns,  as  illustrated  in  Figure  (4.9b). 

Let  b[n\  be  an  image  whose  pixels  have  a  distance  2"^  =  N~^.  We  associate  to  b[n\  a 
function  f{x)  E  Vj  approximated  at  the  scale  2"^.  Its  coefficients  aj[n\  =<  /,  >  are 

discrete  samples  at  scale  of  2-^ 


b[n]  =  Naj[n]  ~  f{N  ^n). 

The  wavelet  image  representation  of  aj  is  computed  by  iterating  Equation  (4.65-4.68) 
for  J  <  j  <  L: 

[  {d],d],d]]j^j<L,aL  ] 

The  original  image  aj  is  recovered  from  this  wavelet  representation  by  iterating  the  re¬ 
construction  Equation  (4.69)  for  J  <J<L. 

4.6  Signal  Estimation  in  Wavelet  Framework 

In  this  section,  we  consider  the  problem  of  signal  estimation  in  an  additive  noise  model. 
A  signal  /  G  I^(Z)  of  support  size  N  is  contaminated  by  the  addition  of  a  noise.  This 
noise  is  modeled  by  the  realization  of  a  zero  mean  random  process  W.  The  measured 
data  are 

Z[n]  =  f[n]  +  W[n],  n  G  Z.  (4.70) 

The  signal  /  is  estimated  by  transforming  the  noisy  observation  Z  with  a  decision  op¬ 
erator  D,  which  is  given  by 


f  =  DZ. 


(4.71) 
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A  statistical  approach  usually  assumes  the  knowledge  of  at  least  the  probability  distri¬ 
bution  of  the  noise  process  W.  An  optimal  D  is  to  then  minimize  the  risk  of  the  estima¬ 
tor,  which  is  the  average  loss  calculated  with  respect  to  the  probability  distribution  of 
noise 

r{D,f)  =  E{\\f-DZ\\}.  (4.72) 

Linear  operators  have  long  predominated  the  solution  to  this  problem  because  of  their 
simplicity,  despite  their  limited  performance. 

The  Bayes  framework  supposes  that  signals  /  are  realizations  of  a  random  vector  F 
whose  probability  distribution  tt  is  a  known  prior.  The  Bayes  risk  is  the  expected  risk 
calculated  with  respect  to  the  prior  probability  distribution  tt  of  the  signal: 

r{D,7r)  =  E^{r{DJ)).  (4.73) 

Then  the  Bayes  estimation  is  to  optimize  D  to  minimize  the  expected  risk  r(i7,7r).  It 
is,  however,  generally  not  possible  to  have  enough  information  to  define  this  prior 
probability  distribution  for  a  signal  set  with  a  complex  structure.  To  overcome  this 
difficulty,  one  may  call  upon  a  minimax  framework  that  applies  a  simpler  model  which 
constrains  signals  in  a  prior  set  0.  The  goal  is  to  then  find  an  optimal  operator  which 
minimizes  the  maximum  risk  over  0,  i.e., 

D  =  arginf  r{D,  0),  (4.74) 

where  the  maximum  risk  is  given  by 

r{D,e)  =  sup  r{D,f).  (4.75) 

fee 

Except  for  a  few  special  cases,  minimax  optimal  operators  are  highly  nonlinear  and 
difficult  to  find  for  real  world  applications.  More  often  than  not,  one  settles  for  a  sub- 
optimal  estimator.  This  section  studies  particular  estimators  that  are  diagonal  in  an 
orthonormal  basis  B  =  {ym}o<m<v-  If  the  basis  B  defines  a  sparse  signal  representa¬ 
tion,  then  such  diagonal  estimators  are  nearly  optimal  among  all  nonlinear  estimators. 
The  noisy  data 


X  =  f  +  W 


(4.76) 
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is  decomposed  in  B.  We  write 

Xs[m]  =<  >,  /s[m]  =< >,  Wslm]  =<W,gm>  ■ 

The  inner  product  of  Equation  (4.76)  with  gives 

Xb[iti\  =  /s[m]  +  WB[m],  0  <  m  <  X. 

We  assume  that  W  is  a  zero-mean  white  noise  of  variance  then 

N-l  N-1 

E{WB[m\WB[p]}  =  EE  g^[n]gp[k]E{W[n]W[k]} 

n=0  k=0 

—  ^  ^  9pi  9  m  ^ 

=  a‘^5{p  —  m). 

The  noise  coefficient  is  hence  also  a  white  noise  of  variance 

A  diagonal  operafor  estimates  independently  each  /^[m]  from  ^^[m]  with  a  function 
dm{x).  The  resulting  estimator  is 

N-l 

f  =  DX  =  Y,  d^{XB[m])gm.  (4.77) 

m=0 

If  sefting  ^^(0)  =  0,  we  can  write 

dm{XB[m])  =  a[m]XB[m],  0  <  m  <  N 
where  a[m\  depends  on  The  estimation  risk 


N-l 

N-l 

=  ^E{\fB[m] 

-  a[m]XB[rn]\^}  =  ^  /^[m]  ^(1  -  a[m]f  -|-  o^a[mf, 

(4.78) 

m=0 

m=0 

is  minimized  by 

r  1  I/bMP 

I/bHP  +  <7^ 

(4.79) 

and  the  minimum  risk  is 

(4.80) 

In  practice,  the  attenuation  factor  a[m]  in  Equation  (4.79)  can  not  be  computed  since  it 
depends  on  |/B[m]|,  whose  value  is  not  known.  We  hence  refers  to  Equation  (4.79)  as 
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an  oracle  attenuation. 

The  analysis  of  diagonal  estimators  can  be  simplified  by  restricting  a[m\  e  {0, 1}.  A 
non-linear  projector  that  minimizes  the  risk  in  Equation  (4.78)  is  defined  by 


J  1,  \fB[m\\>a 

a[m\  =  < 

I  0,  otherwise 

Similar  to  the  case  of  Equation  (4.79),  this  projector  can  not  be  implemented  because 
a[m\  depends  on  the  value  of  l/ejm]  |.  The  risk  of  this  oracle  projector  is  computed  with 
Equation  (4.78),  and  we  obtain, 

N-l 

rpif)  =  (4.81) 

m=0 

Since  for  any  x,y  >  0, 

inm{x,y)  >  — —  >  -mm{x,y), 
cc  y  Zi 

we  have 

Tpif)  >  rinf(/)  >  ^rp{f). 

The  risk  of  the  oracle  projector  is  of  the  same  order  as  the  risk  of  the  oracle  attenuation 
(4.80).  One  can  use  the  risk  in  (4.81)  to  verify  the  performance  of  practical  thresholding 
estimators. 

Instead  of  depending  on  a  feasible  approach  is  to  use  to  determine  an 

appropriate  projection.  A  diagonal  estimator  can  be  written  as 

N-l 

f  =  DX  =  Y,dmiXB[m])g^. 

m=0 

A  hard  thresholding  estimator  is  implemented  with 


dm{x)  =  Pt{x) 

The  risk  of  this  thresholding  is 


X,  |a;|  >  T 
0,  |x|  <  T 


N-l 

^t(/)  =  E{\fB[m]  -  pT{XB[m])\^}. 
m=0 


(4.82) 
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Since  ^^[m]  =  /^[m]  +  WbItti], 

,  f  (t2,  \XB[m\\>T 

E{\fB[m]-pT{XB[mm=\  '  ^  . 

I  |/s[”^]r)  otherwise 

Hence  the  risk  of  the  hard  thresholding  estimator  is  larger  than  the  risk  of  the  oracle 
projector, 

N-l 

nif)  >  rpif)  =  ^min(|/B[m]|^(T^). 

m=0 

An  oracle  attenuation  yields  a  risk  rinf  that  is  smaller  than  the  risk  of  rp(/)  of  an  oracle 
projection,  by  slightly  decreasing  the  amplitude  for  all  coefficients  in  order  to  reduce 
the  added  noise.  A  similar  attenuation,  although  non-optimal,  is  implemented  by  a 
soft  thresholding,  which  decreases  by  T  the  amplitude  of  all  noisy  coefficients.  This 
soft  thresholding  function  is  given  by 

X  —  T,  X  >T 


dm{x)  =  Pt{x)  =  < 


rc  +  T,  X  <  —T 


0, 


\x\  <  T 


It  is  the  solution  that  minimizes  a  quadratic  distance  to  the  data,  penalized  by  an  h 
norm.  Given  the  data  x[m],  the  vector  y[m]  which  minimizes 

N-l  N-l 

^  \y[m]  -  +  \y[m\\ 

m=0  m=0 

is  y[m\  =  pT{x[m\).  The  threshold  is  generally  chosen  so  that  it  is  just  above  almost 
all  the  noise  coefficients.  Since  Wb  is  a  vector  of  N  independent  Gaussian  random 
variables  of  variance  By  taking  T  =  aJ 2  log^,  one  can  show  that 


lim  P(max(lTB[m])  <  T)  =  1. 

-/V— >-+oo  m 

The  following  theorem  [82]  proves  that  the  risk  of  a  thresholding  estimator  is  close  to 
the  risk  of  an  oracle  projector  defined  in  Equation  (4.81). 

Theorem  4.8  Let  T  =  a^J  2  log^.  The  risk  of  a  hard  or  soft  thresholding  estimator  satisfies  for 
all  N  >  4 


nif)  <  (21ogf  +l)((T^ +  rp(/)). 
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A  filter  bank  tree  ot  depth  L  —  J  >  0  decomposes  a  discrete  signal  in  a  discrete  wavelet 
basis  defined  in  Section  4.3. 

B  =  [{(f)L[k-  {^j[k  -  ]  ■  (4.83) 

is  an  orthonormal  basis  ot  I^(Z).  A  wavelet  thresholding  estimator  can  be  written  as 

L  +00  +00 

/=  E  E  Pt{^  “i  ^  ^  Pt{^  (4.84) 

j=J+l  m=— oo  m=—oo 

where  Pt(')  a  hard  or  soft  thresholding  function.  In  a  wavelet  signal  representation, 
large  amplitude  coefficients  correspond  to  transient  signal  variations,  this  means  that 
the  thresholding  estimation  only  keeps  transients  coming  from  the  underlying  signal, 
without  adding  others  due  to  the  noise. 

The  threshold  T  =  (t  \J 2  log^  is  not  optimal  and  in  general  a  lower  threshold  reduces 
the  risk.  A  threshold  adapted  to  the  data  is  calculated  by  minimizing  an  estimation 
of  the  risk.  Denote  rt{f,T)  the  risk  of  a  soft  thresholding  estimator  calculated  with  a 
threshold  T.  An  estimate  ft(/,  T)  of  rt{f,T)  is  calculated  from  the  noisy  data  X,  and 
T  is  optimized  by  minimizing  ft{f,T).  To  estimate  the  risk  rt{f,T),  observe  that  if 
|XB[m]|  <  T  then  the  soft  thresholding  sets  this  coefficient  to  zero,  which  produces  a 
risk  equal  to  |/s[m]  p.  Since 

E{\Xn[m]n  =  \Um]\^  +  a\ 

one  can  estimate  |/B[m]p  with  |XB[m]p  —  If  |XB[m]|  >  T,  the  soft  thresholding 
subtracts  T  from  the  amplitude  of  Xslm].  The  expected  risk  is  the  sum  of  the  noise 
energy  plus  the  bias  introduced  by  the  reduction  of  the  amplitude  of  ^^[m]  by  T.  It  is 
estimated  by  +  T^.  The  resulting  estimator  ot  rt(/,  T)  is 

N-l 


(4.85) 
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It  can  be  shown  [82]  that  ft{f,  T)  is  a  Stein  Unbiased  Risk  Estimator  (SURE)  of  rt{f,  T), 
i.e.  E{h{f.T)]  =  n{f,T). 

To  find  the  T  that  minimizes  the  SURE  estimator  ft{f,  T),  the  N  data  coefficients  Xslm] 
are  sorted  in  decreasing  amplitude  order  with  0(Alog2  A)  operations.  Let  = 

XB[mk]  be  the  coefficient  of  rank  k  :  |A]j[A;]  >  \X'^[k  +  1]|  for  0  <  k  <  N  —  2.  Let  /  be  the 
index  such  that  \X'^[l  +  1]  |  <  T  <  |  |.  We  can  rewrite  Equation  (4.85): 

N-l 

f,(f,  T)  =  Y,  -  (JV  -  IW  +  ((ff"  +  r").  (4.86) 

k=l 

To  minimize  ft{f,T),  we  must  choose  T  =  \Xg[l  +  1]  |  because  ft(/,  T)  is  increasing  in  T. 
It  is  therefore  sufficient  to  compare  the  N  possible  values,  {\XB[k]\}o<k<N,  to  find  the 
T  that  minimizes  (/,  T),  that  requires  0{N)  operations  if  we  progressively  recompute 
the  formula  (4.86).  The  calculation  of  T  is  thus  performed  with  0{N  log2  N)  operations. 
Although  the  estimator  ft(/,  T)  of  rt{f,T)  is  unbiased,  its  variance  may  induce  errors 
due  to  the  noise  energy,  especially  when  II/IP  C  AjllWlp}  =  Na"^.  In  this  case,  one 
must  impose  T  =  ay/2\og^  in  order  to  remove  all  the  noise.  Since  AjUXlp}  =  ||/|p  + 
Na"^,  we  estimate  1 1  / 1  p  with  1 1 X  |  p — A(t^  and  compare  this  value  with  a  minimum  energy 
level  Cat  =  (T^A^/^(logg  A)^/^.  The  resulting  SURE  threshold  is 


T 


(7^21oggA,  \\X\\^  -  Na^  <  eM 

f,  ||A||2-Aa2>e^ 


Let  0  be  a  signal  set  and  miiir  rt(0)  be  the  minimax  risk  of  a  soft  thresholding  obtained 
by  optimizing  the  choice  of  T  depending  on  0.  Donoho  and  Johnstone  [82]  prove  that 
the  threshold  computed  empirically  with  the  above  equations  yields  a  nearly  minimax 
risk. 

Figure  (4.10)  demonstrates  the  estimation  result  of  a  noisy  piecewise  smooth  signal 
with  a  soft  threshold  estimator  with  the  SURE  threshold  T. 
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Figure  4.10:  Estimation  results  of  a  noisy  piecewise  smooth  signal  with  a  soft  threshold 
estimator  with  the  SURE  threshold  T. 
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CURRENT  approaches  to  denoising  or  signal  enhancement  in  wavelet-based  frame¬ 
work  have  generally  relied  on  the  assumption  of  normally  distributed  perturba¬ 
tions.  In  practice,  this  assumption  is  often  violated  and  sometimes,  even  prior  infor¬ 
mation  of  probability  distribution  of  the  noise  process  is  not  available.  To  relax  this 
assumption,  we  propose  a  novel  non-linear  filtering  technique  in  this  section.  The  key 
idea  is  to  project  a  noisy  signal  onto  a  wavelet  domain  and  to  suppress  wavelet  co¬ 
efficients  by  a  mask  derived  from  curvature  extrema  in  its  scale  space  representation. 
For  a  piecewise  smooth  signal,  it  can  be  shown  that  filtering  by  this  curvature  mask  is 
equivalent  to  preserving  the  signal  pointwise  Holder  exponents  at  the  singular  points 
and  lifting  its  smoothness  at  all  the  remaining  points. 
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5.1  Introduction 

Inspired  by  the  fact  that  the  human  visual  system  processes  and  analyzes  image  in¬ 
formation  at  different  scales,  researchers  have  made  extensive  use  of  the  multiscale 
analysis  in  signal  and  image  processing  applications.  Wavelet  theory  has  played  a  par¬ 
ticularly  important  role  in  multiscale  analysis  due  to  the  fact  that  the  basis  functions 
are  well  suited  to  analyze  local  scale  phenomena.  This  property  also  endows  wavelets 
with  a  remarkable  property  for  denoising  in  a  wavelet  based  framework. 

Donoho  and  Johnstone  [7]  first  showed  that  effective  noise  suppression  maybe  achieved 
by  wavelet  shrinkage.  Given  the  noisy  wavelet  coefficients,  i.e.  the  true  wavelet  coef¬ 
ficients  plus  a  noise  term,  and  assuming  that  one  has  knowledge  of  the  true  wavelet 
coefficients,  an  ideal  filter  sets  a  noisy  coefficient  to  zero  if  the  noise  variance  cP'  is 
greater  than  the  square  of  the  true  wavelet  coefficient;  otherwise  the  noisy  coefficient 
is  kept.  In  this  way,  the  mean  square  error  of  this  ideal  estimator  is  the  minimum  of  cP 
and  the  square  of  the  coefficient.  Under  the  assumption  of  i.i.d.  normal  noise,  it  can 
shown  that  a  soft  thresholding  estimator  achieves  a  risk  at  most  0(log  M)  times  the  risk 
of  this  ideal  estimator,  where  M  is  the  length  of  the  observation. 

To  choose  an  appropriate  threshold,  Donoho  and  Johnstone  [7]  have  taken  a  mini¬ 
max  approach  to  characterize  the  signal,  and  they  proved,  by  setting  a  threshold  T  = 
(7Y^2Tog^~M,  that  the  estimation  risk  is  close  to  the  minimax  risk.  Krim  and  Pesquet  [8] 
have  given  an  alternative  derivation  for  this  threshold,  using  Rissanen's  Minimum  De¬ 
scription  Length  (MDL)  criterion  [9]  and  the  assumption  of  normally  distributed  noise. 
The  threshold  T  increase  with  M  is  due  to  the  tail  of  the  Gaussian  distribution,  which 
tends  to  generate  larger  noise  coefficients  when  sample  size  increases.  This  thresh¬ 
old  is  not  optimal,  and  in  general  a  lower  threshold  reduces  the  risk.  To  refine  the 
threshold,  a  SureShrink  [10]  procedure  is  proposed.  Sureshrink  calculates  thresholds 
by  the  principle  of  minimizing  the  Stein  unbiased  estimate  of  risk  for  threshold  esti¬ 
mates.  SureShrink  is  also  based  on  the  assumption  of  i.i.d.  normal  noise.  For  non- 
Gaussian  type  of  noise,  Neumann  [11],  Averkamp  and  Houdre  [12]  studied  the  choice 
of  thresholds  by  having  recourse  to  asymptotics.  Wavelet  thresholding  theory  is,  how- 
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ever,  based  on  the  assumption  that  we  know  the  statistics  of  the  noise  to  determine  an 
adequate  threshold.  This  makes  the  algorithm  less  flexible  and  less  adaptive  to  differ¬ 
ent  scenarios  which  can  result  in  an  even  worse  reconstruction.  Compensation  for  the 
lack  of  a  prior  knowledge  of  the  noise  statistics  may  be  handled  by  adopting  the  mini- 
max  principle  [13]  upon  deriving  the  worst  case  noise  distribution. 

Points  of  sharp  variations  are  often  among  the  most  important  features  for  analyzing 
properties  of  transient  piecewise  smooth  signals.  To  characterize  the  singular  struc¬ 
tures,  Holder  exponents  [14]  provide  a  pointwise  measure  of  a  function  over  a  time 
interval.  Due  to  the  pioneering  work  by  Jaffard  [15]  and  Meyer  [16],  it  can  be  shown 
that  a  local  signal  singularity  of  a  function  is  characterized  by  the  decay  of  its  wavelet 
transform  amplitude  across  scales. 

In  this  paper  we  first  reinterpret  the  denoising  problem  as  one  of  having  to  adjust  the 
pointwise  smoothness  of  noisy  data,  and  propose  a  novel  non-linear  estimator  as  a  re¬ 
sult.  The  key  idea  is  to  separate  out  the  signal  portion  from  its  noisy  data  to  preserve 
the  original  smoothness  property  of  the  underlying  signal,  while  the  remaining  noisy 
data  admits  the  same  Holder  exponents  as  the  noise.  This  non-linear  filter  would  be 
optimal  in  the  sense  of  recovering  the  smoothness  of  the  true  underlying  signal. 

A  crucial  step  for  realizing  such  a  smoothness-constrained  filter  is  to  identify  the  sin¬ 
gularities  of  the  true  signal.  We  tackle  this  problem  with  the  theory  of  curve  evolu¬ 
tion  [83,  84],  which  is  inherently  geometric  in  nature,  and  widely  used  in  computer 
vision  [85,  86]  and  image  processing  [87].  The  basic  idea  is  that  a  planar  curve  deforms 
in  the  direction  of  its  Euclidean  normal,  with  a  speed  equal  to  its  curvature.  The  noise 
riding  on  the  signal  has  relatively  higher  curvatures  in  comparison  to  the  underlying 
signal.  It  thus  tends  to  be  smoothed  much  faster  than  the  latter.  This  disparity  in  evo¬ 
lution  speed  [38]  is  key  to  preserving  the  true  features  of  the  signal. 

With  the  knowledge  of  the  singularities,  we  proceed  to  generate  a  multiscale  curvature 
mask  to  filter  the  wavelet  transform  of  the  noisy  data.  Specifically,  we  prove  that  filter¬ 
ing  the  transform  by  a  curvature  mask  is  equivalent  to  keeping  the  pointwise  Holder 
exponent  of  the  noisy  data  at  singular  points,  and  lifting  its  smoothness  at  all  the  re- 
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maining  points.  In  addition,  residual  data  (noisy  data  minus  estimate)  admits  the  same 
Holder  exponents  as  the  noise  except  at  the  singular  points  of  the  signal. 

In  the  section  that  follows,  we  briefly  introduce  the  concept  of  Holder  exponent  and 
regularity  measurement  with  wavelets.  In  Section  5.3,  we  give  a  concise  statement 
of  the  problem.  In  Section  5.4,  we  derive  the  singularity  detection  algorithm  through 
curve  evolution.  In  Section  5.5,  we  formally  define  the  smoothness-constrained  filter, 
propose  its  implementation  and  verify  its  properties.  Some  numerical  results  appear  in 
Section  5.6.  Finally,  we  provide  concluding  remarks  in  Section  5.7. 


5.2  Regularity  measurement  with  wavelets 

To  characterize  singular  structures,  it  is  necessary  to  precisely  quantify  the  local  reg¬ 
ularity  of  a  signal.  Holder  spaces  and  Holder  exponent  provide  a  uniform  regularity 
measurement  over  time  intervals,  as  well  as  at  a  particular  point. 


Definition  5.1  Let  ICR  and  f  he  a  continuous  function  from  I  to  R.  f  is  said  to  belong  to  a 
global  Holder  space  n  >  0  if  and  only  if  for  any  v  C  I  there  exists  a  positive  constant  c 

and  a  polynomial  Py  of  degree  m  =  [nj,  [nj  denotes  the  largest  integer  m  <  a,  such  that 

\f{x)  —  Py{x  —  11)1  <  c\x  —  vl^^,  Vx  C  I. 

Definition  5.2  /  is  said  to  belong  to  a  pointwise  Holder  space  C^{xo),  a  >  0,  xq  C  I  if  and 
only  if  there  exists  a  positive  constant  c  and  a  polynomial  P^^  of  degree  m  =  [nj  such  that 

\f{x)  -  Px^{x  -  xo)|  <  c\x  -  Xol",  Mx  e  I. 

Definition  5.3  A  function  f  is  said  to  have  a  Holder  exponent  a  at  point  xq  if  there  exists  a 
polynomial  P^^  of  degree  m  =  [nj  and  f  satisfies 

•  for  any  (3  <  a, 

|/(xo  +  A)-P.„(A)| 
a“o  |A|/3 
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•  if  a  <  +oo,for  any  (5  >  a 


lim 

A^-O 


|/(xo  +  A)-P.„(A)| 
|A|/3 


=  +00. 


The  vanishing  moment  property  of  a  wavelet  function  is  crucial  to  measure  the  local 
regularity  of  a  signal.  If  a  wavelef  ftf  )  has  n  vanishing  moments,  i.e. 

/  +  00 

=  0,  0  <  k  <  n,  (5.1) 

■00 

it  can  be  shown  [70]  that  the  wavelet  transformation  is  actually  a  multiscale  differential 
operator  of  order  n.  This  nice  property  relates  the  differentiability  of  /  with  its  wavelet 
transform  decay  at  fine  scales. 

Due  to  the  pioneering  work  by  Meyer  [16]  and  Jaffard  [15],  it  can  be  shown  that  a  lo¬ 
cal  signal  singularity  is  characterized  by  the  decay  of  its  wavelet  transform  amplitude 
across  scales. 


Theorem  5.1  [15]  Let  xIj{-)  be  a  wavelet  with  n  vanishing  moments,  f  G  L^(M)  and  W f{-,  •) 
denotes  its  wavelet  transform.  Suppose  f  has  a  Holder  exponent  a  <  nat  x,  then  there  exists  a 
constant  A  such  that 

,  01  -  rr> 

y{u,  s)  e  M  X  M+, \Wf{u,  s)|  <  (1  +  l^^r). 

Conversely,  if  a  <  n  is  not  an  integer  and  there  exists  a  constant  A  and  a'  <  a  such  that 
y(u,s)  eMxE+,|lT/(ri,s)|  <  As“+^(1  + 

s 

then  f  is  Holder  aat  x. 

5.3  Problem  Formulation 

A  signal  /  G  L^(E)  is  contaminated  by  the  addition  of  a  noise.  This  noise  is  modeled 
by  the  realization  of  a  zero  mean  random  process  {N}.  The  measured  dafa  are 


Z{x)  =  f{x)  +  H(x),  X  G  E. 


(5.2) 
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The  signal  /  is  estimated  by  transforming  the  noisy  observation  Z  with  a  decision  op¬ 
erator  D,  which  is  given  by 

/  =  DZ.  (5.3) 

A  statistical  approach  usually  assumes  the  knowledge  of  at  least  the  probability  dis¬ 
tribution  of  the  noise  process  {A},  and  maybe  a  prior  distribution  of  the  signal.  An 
optimal  D  is  to  then  minimize  the  risk  of  the  estimator,  which  is  the  average  loss  calcu¬ 
lated  with  respect  to  the  probability  distribution  of  noise 

r(U,/)=A{d(/-UZ)},  (5.4) 

where  d(')  is  ^  cost  function.  It  is,  however,  generally  not  possible  to  have  enough 
information  to  define  this  prior  probability  distribution  for  a  signal  set  with  complex 
structure.  To  overcome  this  difficulty,  one  may  call  upon  a  minimax  framework  that 
applies  a  simpler  model  which  constrains  signals  in  a  prior  set  0.  The  goal  is  to  then 
find  an  optimal  operator  which  minimizes  the  maximum  risk  over  0,  i.e., 

D  =  arginf  r(U,  0),  (5.5) 

where  the  maximum  risk  is  given  by 

r(i3,  0)  =  sup  r(i3,  /).  (5.6) 

fee 

Except  for  a  few  special  cases,  minimax  optimal  operators  are  highly  nonlinear  and  dif¬ 
ficult  to  find  for  real  world  applications.  More  often  than  not,  one  settles  for  a  subopti- 
mal  estimator.  The  well  known  thresholding  estimator  in  an  orthonormal  wavelet  basis 
proposed  by  Donoho  and  Johnstone  [10]  has  a  suboptimal  risk  rt{Q)  (logg  M)rmin(0) 
for  the  set  of  piecewise  smooth  signals,  where  M  is  the  observation  length  and  rmm(0) 
is  the  minimax  risk. 

In  this  paper,  we  propose  a  novel  nonlinear  filtering  technique.  Contrary  to  statistical 
methods,  we  assume  that  a  prior  knowledge  about  pointwise  smoothness  measure  of 
the  signal  is  known  or  can  be  extracted.  However,  this  smoothness  property  of  the  sig¬ 
nal  is  corrupted  by  additive  noise,  which  in  general  has  a  uniform  Holder  exponent  less 
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than  1.  As  noted  earlier,  we  view  the  denoising  problem  as  one  of  carefully  controlling 
the  Holder  exponents  of  measured  data  with  a  goal  of  extracting  the  signal  portion  with 
some  smoothness  fidelity  to  the  original.  Let  «/(•)  and  aN{-)  characterize  the  pointwise 
Holder  exponent  of  /(•)  and  N{-)  respectively  A  ideal  operator  T  satisfies  the  following 
two  conditions: 

•  f  =  TZ  admits  Q;/(-)  as  its  pointwise  Holder  exponent 

•  V{x)  =  Z(x)  —  f{x)  admits  Q;Ar(-)  as  its  pointwise  Holder  exponent 

This  non-linear  filter  is  optimal  in  the  sense  of  recovering  the  smoothness  of  the  true 
underlying  signal. 


5.4  Curve  evolution  and  singularity  detection 


Singularities  and  irregular  structures  often  carry  the  most  important  information  in 
signals.  Many  researchers  [88],  [89],  [90],  [91]  and  [92]  have  developed  singularity  de¬ 
tection  techniques  based  on  multiscale  transforms.  In  this  Section,  we  discuss  the  evo¬ 
lution  of  a  planar  curve  in  Euclidean  space,  in  which  the  planar  curve  evolves  in  the 
direction  of  its  Euclidean  normal,  with  a  speed  equal  to  its  local  curvature.  Using  the 
well  known  foundation  of  such  an  evolution,  we  proceed  to  derive  the  partial  differen¬ 
tial  equations  that  characterize  the  evolution  of  curvature.  We  subsequently  propose  a 
new  singularity  detection  method  by  tracking  the  curvature  extrema  across  scales. 

Let  C(-,  0)  :  ^  be  a  smooth  planar  curve  in  a  Euclidean  space,  then  a  geometric 

curve  flow  [83]  [93],  C(s,  t)  :  x  [0,  T)  E^,  is  characterized  by  the  following  partial 

differential  equation 


=  W 


where  t  denotes  the  scale,  k  is  the  curvature  of  C,  and  M  is  its  unit  normal  vector.  This 
geometric  curve  flow  is  illustrated  in  Figure  (5.1). 

If  we  restrict  the  curve  in  Cartesian  coordinates  in  E^  so  that  C  =  (a:,  y)  is  locally 
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Figure  5.1:  A  geometric  curve  flow. 

a  graph,  and  require  points  to  fix  their  a:-coordinate  during  evolution,  then  we  get  a 
different  flow  y{x,  t)  :  M  x  [0,  T)  ^  M  that  can  be  represented  as 


(5.7) 


=  k{x,  t)  sec  9{x,  t) 
y{x,0)  =  yo{x) 

where  yo{-)  is  an  initial  smooth  curve  in  cartesian  coordinates,  9{x,  t)  =  tan“^(y'(a:,  t)), 
"  '  "  denotes  differentiation  with  respect  to  x,  and  k{x,  t)  is  the  curvature  given  by 


k  = 


(1  +  ^/2)3/2- 

The  modification  term  for  vertical  speed  is  sec  9 

sec  9  =  [{1  +  y'‘^Y^‘^- 


(5.8) 


(5.9) 


Substituting  Equations  (5.8)  and  (5.9)  into  (5.7),  we  obtain  an  evolution  of  y  for  a  fixed 


X, 


dy 


(5.10) 


dt  1  +  y'"^ 

In  addition,  if  \y'\  is  bounded,  Grayson  [83]  showed  k{x,t),  as  a  result,  also  evolves 
according  to  equation 

dk  k”  ^  /c  1 1  \ 

k'^.  (5.11) 


k" 


dt  1  +  y'2 

Curvature  is  a  natural  indication  of  sharp  variations  in  a  signal.  When  a  signal  is  con¬ 
taminated  by  an  additive  noise,  it  makes  sense  to  detect  the  singularities  by  tracking 
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the  curvature  extrema  across  curve  evolution.  Equation  (5.11)  indicates  that  the  noise 
riding  on  the  signal  has  relatively  higher  smoothing  evolution  speed  in  comparison  to 
the  underlying  features.  It  thus  tends  to  be  "washed"  away  much  faster  than  the  latter. 
This  disparity  in  evolution  speed  is  key  to  extracting  the  true  features  of  the  signal.  To 
obtain  an  accurate  estimate  of  singularity,  we  also  have  to  ensure  that  no  new  local  ex¬ 
trema  can  be  created  during  the  curve  evolution,  which  is  generally  referred  to  as  the 
Causality  property.  The  following  proposition  validates  this  requirement. 

Proposition  5.1  New  extrema  of  y  and  k  can  not  be  created  in  passing  from  fine  to  coarse 
scales,  i.e.,for  any  ti>  Iq  >  0,  all  the  extrema  points  ofy{-,  ti)  and  k{-,  ti)  are  extrema  points 
in  y{-,  to)  and  k{-,  to). 

Proo/‘ Applying  the  Maximum  Principle  [94],  if  \y'\  is  bounded,  it  follows  directly  that 
new  maxima  and  minima  of  y  and  k  can  not  be  created  as  a  curve  evolves  by  equation 
(5.10).  In  fact,  by  lemma  1.9  in  [83],  we  can  show  that  for  a  given  choice  of  cartesian 
coordinates,  local  minima  of  y  and  k  increase  with  time,  and  local  maxima  decrease. 
Furthermore,  the  points  of  the  curve  where  y  and  k  reach  their  local  minima  and  max¬ 
ima  vary  continuously  with  time.  ■ 

A  test  signal  defined  on  interval  [0, 1]  is  shown  in  Figure  5.2(a).  By  evolving  it,  we  have 
the  curvature  variation  across  scales  as  illustrated  in  Figure  5.2(b),  and  the  curvature 
extrema  propagation  line  across  evolution  scales  in  Figure  5.2(c).  Then  we  define  a 
function  L{x)  to  denote  the  length  of  extrema  propagation  line  originated  at  rc  G  [0,1] 
and  a  set  Vt  =  {x  :  L{x)  >  T}.  T  is  a  threshold  and  we  set  it  to  be  half  of  the  length 
of  the  longest  propagation  line  of  curvature  extrema.  Finally  we  define  a  singularity 
indicator  function  p(-)  as 


Xi^Cl 


(5.12) 
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Figure  5.2:  Singularity  detection  by  tracking  curvature  extrema  through  curve  evolu¬ 
tion 

5.5  A  smoothness  constrained  filter 

Let  /  G  L^(M)  be  a  piecewise  smooth  signal.  Its  observation  Z(-)  is  modeled  as  in 
Equation  (5.2).  Let  ip{x)  be  a  compact  support  wavelet  with  n  vanishing  moments  and 
lTZ(s,  u)  denote  the  continuous  wavelet  transformation  of  Z(-),  which  is  given  by 


WZ{s,u)  =  Z{x)-ki>s,u{x) 


(5.13) 


where 

By  tracking  curvature  extrema  through  curve  evolution  of  measured  data  Z{-),  we  get 
a  singularity  indicator  function  p(-)  as  defined  in  Equation  (5.12). 

Let 


p{u) 


exp(Y^),  \u\<l 

0,  |ri|  >  1 


and  C  be  the  support  of  ?/)(•).  Then  we  define 


(5.14) 
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(3.)  Blocks 


(fc))  Bumps 


Figure  5.3:  Clean  signals  of  Blocks,  Bumps,  HeaviSine  and  Doppler 

11 

q{s,u)  =  p{u)-k  p{—),  s  >  0. 

and  a  multiscale  mask  is  given  by 


h{s,  u) 


q{s,  u),  if  q{s,  u)  <  1 

1,  otherwise. 


(5.15) 


Since  this  mask  is  derived  from  curvature  extrema,  we  call  it  a  multiscale  curvature 
mask.  Then  a  smoothness  constrained  filter  T  for  piecewise  smooth  signals  may  be 
defined  as 

f{x)  =  TZ{x)  =  W-\WZ{s,  u)h{s,  ?/)},  (5.16) 

where  denotes  the  inverse  wavelet  transform. 

To  analyze  the  property  of  this  filter,  we  first  investigate  the  pointwise  Holder  expo¬ 
nents  of  the  measured  data  Z{-). 
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(a)  Noisy  Blocks 


(b)  Noisy  Bumps 


(d)  Noisy  Doppler 


Figure  5.4:  Gaussian  noise  corrupted  signals  of  Blocks,  Bumps,  HeaviSine  and  Doppler 

Proposition  5.2  Let  f  e  L^(M)  be  a  piecewise  smooth  signal  and  N{-)  denote  a  realization  of 
a  zero  mean  noise  process.  Suppose  a^f)  characterize  the  pointwise  Holder  exponent  of  Nf), 
then  the  measured  data  Z  =  f  N  admits  aN{-)  as  its  pointwise  Holder  exponent  except  at  the 
singular  points  of  f. 

Proof: :  see  Appendix  A 

An  ideal  smoothness  constrained  filter,  as  noted  earlier,  is  to  isolate  and  localize  the  sig¬ 
nal  portion  within  the  measured  data.  The  following  results  establish  that  filtering  by 
a  multiscale  curvature  mask  as  defined  in  Equation  (5.16)  is  equivalent  to  keeping  the 
signal  pointwise  Holder  exponents  at  the  singularity  points  of  /  and  lifting  its  smooth¬ 
ness  at  all  the  remaining  points. 


Proposition  5.3  Let  f,g^  L^(E)  and  xpf)  be  a  compact  support  wavelet  with  n  vanishing 
monents,  and  Wff,  •),  Wg{-,  •)  be  the  wavelet  transforms  of  f  and  g  respectively.  Suppose 
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(a)  SUREShrink[Blocks] 


(c)  SUREShrink[Bumps] 


(c)  SUREShrink[HeaviSine] 


(d)  SUREShrink[Doppler] 


Figure  5.5:  Denoising  results  by  SureShrink  for  Gaussian  noise  corrupted  signals 
af{-)  and  ag{-)  are  pointwise  Holder  expontent  functions  of  f,  g  and  satisfy 
af{x)  <  n,  ag{x)  <  n,  (Xfix)  +  (Xgix)  <  n,  re  G  M 


then 

y  =  W-^{Wf{s,  u)Wg{s,  ?/)},  (s,  u)eR+  xR 
admits  (Xf(x)  +  ag(x)  +  ^  as  its  pointwise  Holder  exponent  for  any  re  G  M. 
Proof: :  see  Appendix  A 


Proposition  5.4  Let  ipf)  he  a  wavelet  of  compact  support  and  with  n  vanishing  monents. 
Consider  the  multiscale  curvature  mask  as  defined  in  Equation  (5.15)  and  denote 

y  =  W~^ {h{s,u)},  {s,u)  G  M"*"  X  M 

then  y  belongs  to  the  Holder  space  G”(M)  almost  everywhere. 
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0  0.2  0.4  0.6  0.8  0  0.2  0.4  0.6  0.8 

Figure  5.6:  Multiscale  curvature  mask  for  (a)  Blocks,  (b)  Bumps,  (c)  HeaviSine  and  (d) 
Doppler 

Proof: :  see  Appendix  A 


Proposition  5.5  Let  'tpf)  be  a  wavelet  of  compact  support  and  with  n  vanishing  monents.  Let 
/  G  L^  (M)  be  a  piecewise  smooth  signal.  Its  measured  data  Z{-)is  modeled  as  in  Equation  (5.2). 
Suppose  a]sf{-)  characterizes  the  pointwise  Holder  exponent  of  Nf).  Let  T  be  a  smoothness 
constrained  filter  as  defined  in  Equation  (5.16)  and  /  =  TZ.  Then  V  =  Z  —  f  admits  aN{-)  as 
its  pointwise  Holder  exponent  and  f  belongs  to  the  Holder  space  C^^I)  except  at  the  singular 
points  of  f  where  pointwise  Holder  exponents  of  measured  data  Z(-)  are  preserved. 


Proof: :  see  Appendix  A 
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(a)  CurvFilter[Blocks] 
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(b)  CurvFilter[Bumps] 


(c)  CurvFilter[FleaviSine]  (d)  CurvFilter[Doppler] 


Figure  5.7:  Filtering  results  by  the  smoothness  constrained  filter  for  Gaussian  noise 
corrupted  signals 

5.6  Numerical  Experiments 

Four  piecewise  smooth  signals.  Blocks,  Bumps,  HeaviSine  and  Doppler  are  shown  in  Fig¬ 
ure  (5.3).  These  signals  represent  various  spatially  nonhomogeneous  phenomena  and 
are  widely  used  in  simulations  of  Wavelet  Shrinkage  methods. 

Figure  (5.4)  displays  these  signals  corrupted  by  white  Gaussian  noise.  The  signal  to 
noise  ratio  is  16.9dB.  Figure  (5.5)  shows  the  denoistng  results  of  Wavelet  SureShrtnk  [10] 
technique.  By  searching  for  the  singular  points  of  the  signal  from  noisy  data,  we 
proceed  to  generate  the  multiscale  curvature  mask  for  the  test  signals  as  shown  in 
Figure(5.6).  Figure  (5.7)  demonstrates  the  results  by  the  smoothness  constrained  filter 
as  described  in  Section  V.  Gompared  to  Wavelet  SureShrtnk  technique,  our  smoothness 
constrained  filter  preserves  singularities  of  the  underlying  signals  better  and  recov¬ 
ers  the  smoothness  between  singular  points  better.  Filtering  by  a  multiscale  curvature 
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(a)  WZ(tj,,s),  tg=0.65 


s 


(c)  WZ(t^,s),  t^=0.29 


(b)  Wf(tQ,s),  t^=0.65 


s 


s 


Figure  5.8:  Adjustment  of  the  wavelet  transform  amplitude  decay  across  scales  by 
smoothness  constrained  filter 

mask  is  equivalent  to  keeping  the  signal  pomtwise  Holder  exponents  at  the  singular 
points  of  /,  and  lifting  its  smoothness  at  all  the  remaining  points.  This  effect  is  demon¬ 
strated  in  Figure  (5.8),  which  shows  the  adjustment  of  the  wavelet  transform  amplitude 
decay  of  the  noisy  observation  of  the  test  signal  Blocks  at  fine  scales.  As  we  can  see  in  the 
filtered  data,  the  wavelet  transform  amplitude  decay  at  to  =  0.65,  which  corresponds 
to  a  singular  point  in  Blocks,  is  preserved,  and  the  decay  at  to  =  0.29,  which  is  smooth 
in  Blocks,  is  lifted. 

In  the  non-Gaussian  noise  case,  we  test  the  above  four  signals  which  contaminated  by 
Laplacian  noise  with  SNR  of  16.9dB.  Figure(5.9)  shows  the  Laplacian  corrupted  noisy 
signals  and  Figure  (5.10)  demonstrates  the  denoising  results  from  our  proposed  filter 
and  Figure  (5.11)  shows  the  results  from  standard  SureShrink  techniques^.  Laplacian 

^  Fully  realizing  SureShrink  was  not  optimal  for  an  non-Gaussian  noise,  this  is  used  to  merely  illustrate 
our  technique. 
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(a)  Noisy  Blocks 


(c)  Noisy  HeaviSine 


(b)  Noisy  Bumps 


Figure  5.9:  Laplacian  noise  corrupted  signals  of  Blocks,  Bumps,  HeaviSine  and  Doppler 

distribution  has  a  heavier  tail  than  Gaussian  distribution  which  makes  spurious  spikes 
on  noisy  signals  more  likely.  Wavelet  Shrinkage  methods  are  not  robust  to  heavy  tail 
noise,  as  we  can  see  in  the  denoising  result  of  SureShink,  some  spurious  spikes  are  kept. 
However,  performance  of  the  proposed  filter  is  not  degraded  as  demonstrated  in  Fig¬ 
ure  (5.10). 

Since  the  smoothness  constrained  filter  preserves  the  decay  of  the  wavelet  transform 
amplitude  at  the  singular  point  of  /,  it  has  the  advantage  of  singularity  preservation. 
In  Figure  (5.12),  we  test  a  planar  shape,  B-52  contour,  with  the  proposed  filter  to  further 
demonstrate  this  desired  property.  Figure  (5.12c)  shows  the  result  shape  by  our  filter,  in 
comparison  with  the  result  from  a  low  pass  Gaussian  filter  in  Figure  (5.12d),  it  is  clear 
that  the  features  of  the  shape  is  better  preserved  by  the  smoothness  constrained  filter. 

It  is  important  to  note  that  our  derivation  does  not  rely  on  any  statistical  independence 
assumption,  and  can  hence  be  extended  to  images.  Suppose  an  image  can  be  viewed 
as  a  differentiable  function  2;  G  L^(M^),  then  we  can  generalize  the  curve  evolution 
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(a)  CurvFilter[Blocks] 
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(c)  CurvFilter[HeaviSine] 


(b)  CurvFilter[Bumps] 


(d)  CurvFilter[Doppler] 


Figure  5.10:  Filtering  results  by  the  smoothness  constrained  filter  for  Laplacian  noise 
corrupted  signals 


equation  (5.10)  to  two-dimensional  case,  z{x,  y,  t)  :  x  [0,  T)  ^ 

dz{x,y,t)  _  Zxx+Zyy  _  Az 


(5.17) 


dt  ^+zl+z‘^  1+I|V^:||2 

z{x,y,0)  =  zo{x,y) 

where  Zq  is  an  initial  smooth  image  in  Cartesian  coordinates.  Applying  the  Maximum 
Principle,  it  follows  directly  that  if  ||  V2:|p  is  bounded,  new  extrema  can  not  be  created 
as  zq  evolves  from  fine  to  coarse  scales.  We  therefore  can  find  the  true  edges  of  zq  by 
tracking  the  propagation  line  of  its  extrema  across  scale.  Then,  similarly  to  Section  IV, 
we  define  L{x,  y),  {x,  y)  G  to  be  the  length  of  extrema  propagation  line  originating 
at  {x,  y),  and  the  set  Cl  =  {{x,y)  :  L{x,  y)  >  T}  to  represent  the  detected  edge  points.  T 
is  a  threshold  which  we  set  to  be  half  of  the  length  of  the  longest  extrema  propagation 
line.  Finally  we  define  a  edge  indicator  function  p(-,  •)  as 

p{ui,U2)=  ^  5{ui- Xi,U2- yi). 

(Xi,yi)e^ 


(5.18) 
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(a)  SUREShrink[Blocks]  (c)  SUREShrink[Bumps] 


(c)  SUREShrink[HeaviSine]  (d)  SUREShrink[Doppler] 


Figure  5.11:  Denoising  results  by  SureShrink  for  Laplacian  noise  corrupted  signals 

Consider  an  additive  noise  observation  model,  an  image  /  e  L^(]R^)  contaminated  by 
addition  of  a  noise.  This  noise  is  modeled  by  the  realization  of  a  zero  mean  random 
process  {A},  i.e., 

Z{x,y)  =  f{x,y)  +  N{x,y),  (5.19) 

Let  ?/)(•)  be  a  compactly  supported  wavelet  with  n  vanishing  moments.  We  define  a  two 
dimensional  separable  wavelet  y)  =  ip{x)ip{y)  and  denote 

=  tpsr,m{x)i)s2,U2{y),  Sl,S2  >  0,  ^ 

Let  WZ{si,  ui,  S2,  U2)  denote  the  wavelet  transformation  of  Z{x,  y),  which  is  given  by 

WZ{si,Ui,S2,U2)  =  Z{x,y)-kip^^^^^^^^^^^^{x,y).  (5.20) 

By  setting  zo{x,  y)  =  Z{x,  y)  in  equation  (5.17),  an  edge  indicator  function  of  the  under¬ 
lying  image  /  can  be  obtained  as  in  equation  (5.18).  With  this  edge  indicator  function. 
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(c) 


(d) 


Figure  5.12:  Denoising  shapes  by  the  smoothness  constrained  filter,  (a)  Shape  B-52  con¬ 
tour;  (b)  noise  corrupted  shape;  (c)  Resulting  shape  by  the  proposed  filter;  (d)  Resulting 
shape  by  the  low  pass  Gaussian  filter. 


we  can  proceed  to  generate  a  two-dimensional  multiscale  mask.  Let 


ul  +  ul<l 


p(Ui,U2)  = 


0, 


Ui  +  U2>  1 


Then  we  define 


q{si,Ui,S2,U2)  =  p{ui,U2)-k  Si,S2  >  0, 

G  5i  0^2 

where  C  is  the  support  of  '0(-).  A  two-dimensional  multiscale  mask  can  be  given  by 


/i(si,  Ui,  S2,  U2)  =  min(  g(si,  Ui,  S2,  U2),  1  ), 

and  the  resulting  filter  for  images  may  be  defined  as 


f{x,  y)  =  W2  ^{WZ{si,  ui,  S2,  U2)h{si,  ui,  S2,  U2)} 
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where  W2  ^  denotes  a  2-D  inverse  wavelet  transform. 

Figure  (5.13a)  displays  an  original  image  bar  girl,  which  is  well  known  in  the  image 
processing  literature.  Figure  (5.13b)  shows  the  Gaussian  noise  corrupted  image  with  a 
signal  to  noise  ratio  of  9.b3dB.  By  searching  for  the  edge  points  of  bar  girl  from  noisy 
data  as  described  in  equation  (5.17),  an  edge  indicator  function,  displayed  in  Figure 
(5.13c),  is  obtained.  We  proceed  to  generate  an  two-dimensional  mutliscale  mask  with 
this  edge  indicator  function.  Figure  (5.13e)  demonstrates  the  result  by  our  proposed 
filter.  The  SNR  is  improved  to  nearly  3dB.  For  non-Gaussian  distributed  noise.  Fig¬ 
ure  (5.13f)  displays  a  Laplacian  noise  corrupted  image  with  a  SNR  of  9.b4:dB.  With  the 
same  procedure,  the  result  by  the  proposed  filter  is  shown  in  Figure  (5.13f).  In  this  case, 
the  signal  to  noise  ratio  is  increased  to  12AAdB.  By  the  same  argument  for  1-D  smooth¬ 
ness  constrained  filter,  filtering  by  2-D  multiscale  mask  is  equivalent  to  keeping  image 
pointwise  Holder  exponents  at  the  edge  points  of  the  underlying  image  and  lifting  its 
smoothness  at  all  the  remaining  points. 


5.7  Conclusion 

In  this  chapter,  we  proposed  a  novel  non-linear  smoothness  constrained  filtering  tech¬ 
nique.  The  key  idea  is  to  separate  the  signal  portion  out  of  its  measured  data,  and  to 
preserve  the  original  smoothness  property  of  the  underlying  signal.  We  first  briefly 
reviewed  the  definition  of  Holder  space.  Holder  exponent  and  established  results  of 
signal  regularity  measurement  with  wavelets.  To  detect  the  singular  points  of  signal 
from  measured  data,  we  turned  to  curve  shortening  and  derived  the  partial  differen¬ 
tial  equations  that  characterize  the  evolution  of  curvature.  A  new  singularity  detection 
method  by  tracking  the  curvature  extrema  across  scales  is  proposed  and  a  multiscale 
curvature  mask  is  generated.  Then  we  proceed  to  project  measured  data  into  wavelet 
domain  and  suppress  wavelet  coefficients  by  this  multiscale  curvature  mask.  For  a 
piecewise  smooth  signal,  it  can  be  shown  that  filtering  by  this  curvature  mask  is  equiv- 
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alent  to  keeping  the  signal  pointwise  Holder  exponents  at  the  singular  points  of  the 
underlying  signal  and  lifting  its  smoothness  at  all  the  remaining  points.  An  extension 
of  the  non-linear  smoothness  constrained  filter  to  image  processing  is  studied  at  the 
end  of  this  paper.  Numerical  experiments  demonstrated  that  our  approach  is  effective 
and  efficient. 


5.8  Appendix 


Proof  of  Proposition  5.2 

Suppose  a:  G  M  is  not  a  singular  point. 

Let  a  =  a]\f{x),  then  there  exists  a  polynomial  PN^f)  of  degree  m  =  [oj  such  that 


lim 

A^-O 


|iV(a:  +  A)-PiV,(A)| 
|A|/3 


0,  f  <  a; 


lim 

A^O 


|iV(a;  +  A)-PiV,(A)| 
|A|/3 


-|-cx),  f  >  a. 


Since  /(•)  is  a  piecewise  smooth  signal,  /  is  infinitely  differentiable  at  x  and  there  exists 
a  polynomial  Pfxf)  of  degree  m  =  [/5J  for  any  /3  >  0  such  that 


Let  PZ,  =  PN,  +  PU 


A^-O  |A|^ 


0. 


•  For  any  f  <  a, 


< 

+ 


lim 

A^-O 


lim 

A^-O 


lim 

A^-O 


|Z(a:  +  A)-PZ,(A)| 
|A|/3 

|/(a:  +  A)-P/,(A)| 
|A|/3 

|A(a;  +  A)-PA,(A)| 

|Ap 


0. 
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•  For  any  /5  >  n. 


> 


lim 

A^O 


lim 

A^-O 


lim 

A^-O 


|Z(a:  +  A)-PZ,(A)| 
|A|/^ 

|A(a:  +  A)-PA,(A)| 
|A|/3 

|/(3;  +  A)-P/.(A)| 
|A|/5 


By  the  definition  of  Holder  exponent,  we  conclude  that  Z(x)  admits  ajv(x)  for  all  x  e  M 
except  the  singular  points  of  /.  ■ 


Proof  of  Proposition  5.3 
f  is  Holder  af{x)  <  n  at  x,  we  have 
V(s,  ti)  e  M"*"  X  M, 

\Wf(s,u)\ < 

s 

g  is  Holder  ag{x)  <  n  at  x,  we  have 
V(s,  ti)  e  M"*"  X  M, 

\Wg{s,u)\ < 

therefore. 


|FF/(s,?/)FFy(s,?/)|  <  ^5«/W+«pW+1(i  + 

~  ^  \af(x)  _|_  \aa{x)  _|_  ~  ^  \ai{x)+aa{x)\ 

s  s  s 

Let 


.u-x  , 


s 

,u  —  X 


=  max(|^^|“/(^\ 
s 


ag(x) 


,U  —  X 


af{x)+ag{x) 


and  it  is  then  clear  that  a'  <  af{x)  +  ag{x)  +  Then  we  can  rewrite  the  above  equation 


as 


\Wf{s,u)Wg{s,u)\  <  +  \- — 


by  theorem  5.1  we  conclude  that 


y  =  W-^{Wf{s,u)Wg{s,u)} 
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admits  af{x)  +  ag{x)  +  ^  as  its  pointwise  Holder  exponent  for  any  x  e  M.  ■ 

Proof  of  Proposition  5.4 

We  first  introduce  without  proof  a  theorem  from  Meyer  [16]  and  Mallat  [70]. 

Theorem  5.2  Let  fif)  he  a  wavelet  with  n  vanishing  moments  and  f  e  L^(E)  belongs  to  a 
global  Holder  space  C°'{a,  b),  and  W f{-,  •)  denotes  its  wavelet  transform.  Then  there  exists  a 
constant  A  >  0  such  that 

y{s,u)  G  M+  X  (a,  6), \Wf{s,u)\  <  (5.21) 

Conversely,  suppose  f  is  bounded  and  W  f{s,  u)  satisfies  equation  (5.21)  for  an  a  <  n  that  is 
not  an  integer.  Then  f  belongs  to  the  global  Holder  space  C"  (a  +  e,  6  —  e)  for  any  e  >  0. 


Let  Tl  =  {xi}f^f  be  the  set  of  all  the  singular  points  of  /.  For  convenience,  denote 
Xo  =  — oo  and  X]^  =  +oo.  Observe  that  for  any  e  >  0,  there  exists  Sq  such  that 

h{s,  u)  =  0,  Vs  <  So,  u  E  (yXi  -\-  e,  —  e), 

i  =  0,1,... A  —  1.  Therefore,  by  an  appropriate  choice  of  A,  we  obtain 
\h{s,  rio)|  <  As”'''2  ,  V(s,  ?/)  G  M"*"  X  (xi  +  e,  x,_|_i  —  e), 
i  =  0,1,  ...A  —  1.  This  concludes  the  proof  of  proposition  5.4.  ■ 

Proof  of  Proposition  5.5 

Let  U  =  {xi}fj~f  be  the  set  of  all  the  singular  points  of  /  and  denote  xq  =  —  cxd  and 
xn  =  +00.  By  propositions  5.3  and  5.4,  it  follows  directly  that  /  belongs  to  the  global 
Holder  space  C'^{xi  +  e,  x,+i  —  e)  for  any  e  >  0,  i  =  0, 1,  ...A  —  1.  Let  v  =  Xi  and  suppose 
the  pointwise  Holder  exponent  of  Z  at  x  is  a,  then  V(s,  ti)  G  M"''  x  M, 

|W/(s,x)|  =  \WZ{.s,u)h{s,u)\ 

s 
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Hence,  the  pointwise  Holder  exponents  of  measured  data  Z(-)  are  preserved  at  the 
singular  points  of  /.  Let  V  =  Z  —  f.  Observe  for  any  e  >  0,  there  exists  Sq  such  that  for 

all  s  <  So, 

u  G  (xi  -h  e,  rcj+i  —  e),  WV (s,  u)  =  WZ(s,u), 

and  for  all  s  >  sq. 


u  G  (xi  -h  e,  Xij^i  —  e),  WV (s,  u)  <  W Z{s,u), 

z  =  0, 1,  ■■■N  —  1.  By  proposition  5.2  and  theorem  5.1,  V{-)  admits  aN{-)  as  its  pointwise 
Holder  exponent  except  at  the  singular  points  of  f.  This  concludes  the  proof  of  proposi¬ 
tion  5.5.  ■ 
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Figure  5.13:  (a)  Original  bar  girl;  (b)  bar  girl  plus  Gaussian  noise;  (c)  Edge  points  ex¬ 
tracted  from  noisy  image;  (d)  Filtered  result  from  the  Gaussian  noise  corrupted  image; 
(e)  bar  girl  plus  Laplacian  noise;  (f)  Filtered  result  from  the  Laplacian  noise  corrupted 
image; 


Chaptett 


6 


A  Multiscale  Ap¬ 
proach  for  Pixel-Level 
Image  Fusion 


6.1  Introduction 


WITH  the  development  of  new  imaging  sensors  arises  the  need  for  image  pro¬ 
cessing  techniques  that  can  effectively  fuse  images  from  differenf  sensors  into 
a  single  composite  entity  for  inferprefafion.  Fusion  begins  with  two  or  more  registered 
images  that  contain  different  representations  of  the  same  scene.  They  may  come  from 
differenf  viewing  conditions,  or  even  different  sensors  (visible  and  IR,  CT  and  MRI, 
etc.).  Image  fusion  of  multiple  sensors  in  a  vision  sysfem  could  significantly  reduce 
human/ machine  error  in  detection  and  recognition  of  objects  due  to  the  inherent  re¬ 
dundancy  and  extended  coverage.  For  example,  fusion  of  forward  looking  infrared 
(FLIR)  and  low  lighf  felevision  images  (LLTV)  obtained  by  an  airborne  sensor  platform 
would  aid  a  pilof  fo  navigafe  in  poor  weather  conditions. 

The  actual  fusion  process  can  fake  place  af  differenf  levels  of  information  represen- 
fafion.  A  generic  cafegorizafion  is  fo  consider  the  different  processes  at  signal,  pixel, 
feature  and  symbolic  levels  [23].  We  focus  on  the  so-called  pixel  level  fusion  pro¬ 
cess,  where  a  composite  image  has  to  be  constructed  from  several  inpuf  images.  Some 
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Figure  6.1:  A  general  framework  for  multiscale  fusion  with  wavelet  transform 


generic  requirements  must  be  imposed  on  the  fusion  result.  The  fusion  process  should 
preserve  all  relevant  information  of  the  input  imagery  in  the  composite  image  (pattern 
conservation).  The  fusion  scheme  should  not  introduce  any  artifacts  or  inconsistencies 
which  would  distract  a  human  observer  or  the  following  processing  stages  (Causality). 
Over  the  past  two  decades,  a  wide  variety  of  pixel-level  image  fusion  algorithms  has 
been  developed.  These  techniques  may  be  classified  into  linear  superposition,  logical 
filter  [17],  mathematical  morphology  [18],  image  algebra  [19]  [20],  artificial  neural  net¬ 
work  [21],  and  simulated  aimealing  [22]  methods.  Each  of  these  algorithms  focuses  on 
the  fact  that  the  fused  image  reveals  new  information  concerning  features  that  can  not 
be  perceived  in  individual  sensor  images.  However,  some  useful  information  has  been 
discarded  since  each  fusion  scheme  tends  to  emphasize  different  attributes  of  the  im¬ 
age.  Luo  [23]  provides  a  detailed  review  of  these  techniques. 

Inspired  by  the  fact  that  the  human  visual  system  processes  and  analyzes  image  in¬ 
formation  at  different  scales,  researchers  recently  proposed  a  multiscale  based  fusion 
method  which  is  widely  accepted  [24]  as  one  of  the  most  effective  techniques  for  im¬ 
age  fusion.  A  multicale  transform,  which  may  be  a  pyramid  or  wavelet  transform,  is 
first  calculated  for  each  input  image,  then  a  composite  is  formed  by  selecting  the  coeffi¬ 
cients  from  the  multicale  representations  of  all  the  source  images.  Finally,  a  fused  image 
is  recovered  through  an  inverse  transformation.  In  the  pioneering  work  of  Burt  [28],  a 
Laplacian  pyramid  and  a  "choose  max"  rule  is  proposed  as  a  model  for  binocular  fu- 
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sion  in  human  stereo  vision.  For  each  coefficient  in  the  pyramids  of  source  images,  the 
one  with  the  maximum  amplitude  is  copied  to  the  composite  pyramid  and  the  fused 
image  is  reconstructed  from  an  inverse  pyramid  transform.  More  recently  [95],  fusion 
within  a  gradient  pyramid  was  shown  to  provide  improved  stability  and  noise  immu¬ 
nity. 

Wavelet  theory  has  played  a  particularly  important  role  in  multiscale  analysis.  A  num¬ 
ber  of  papers  [25]  [26]  [27]  have  addressed  fusion  algorithms  based  on  the  orthogonal 
wavelet  transform.  A  general  framework  for  multiscale  fusion  with  wavelet  transform 
is  shown  in  Figure  (6.1).  The  wavelet  transform  offers  certain  advantages  over  the 
Laplacian  pyramid-based  techniques.  Since  the  wavelet  bases  are  chosen  to  be  orthog¬ 
onal,  the  information  gleaned  at  different  resolutions  is  unique;  on  the  other  hand, 
the  pyramid  decomposition  contains  redundancy  between  different  scales.  Further¬ 
more,  a  wavelet  image  representation  provides  directional  information  in  the  high-low, 
low-high  and  high-high  bands,  while  the  pyramid  representation  fails  to  introduce  any 
spatial  orientation  selectivity  into  the  decomposition  process.  A  major  drawback  in 
the  recent  pursuit  of  wavelet-based  fusion  algorithms  is  due  to  a  lack  of  a  good  fu¬ 
sion  scheme.  Most  selection  rules  so  far  proposed  are  in  essence  more  or  less  similar 
to  "choose  max",  which  introduces  a  significant  amount  of  high  frequency  noise  due 
to  the  sudden  switch  of  the  fused  wavelet  coefficient  to  that  which  is  maximum  of  the 
source.  This  high  frequency  noise  is  particularly  undesirable  to  visual  perception. 

In  this  chapter,  we  apply  a  biorthogonal  wavelet  transform  to  the  pixel  level  image 
fusion.  It  is  possible  to  construct  smooth  biorthogonal  wavelets  of  compact  support 
which  are  either  symmetric  or  antisymmetric.  At  the  exception  of  a  Flaar  wavelet,  it 
has  been  shown  [29]  that  symmetric  orthogonal  wavelets  are  impossible  to  construct. 
Symmetric  or  antisymmetric  wavelets  are  synthesized  with  perfect  reconstruction  fil¬ 
ters  having  a  linear  phase.  This  is  a  desirable  property  for  image  fusion  applications. 
Unlike  the  "choose  max"  type  of  selection  rules,  we  propose  an  information  theoretic 
fusion  scheme.  For  each  pixel  in  a  source  image,  a  vector  consisting  of  wavelet  coef¬ 
ficients  at  that  pixel  position  across  scales  is  formed  to  indicate  the  "activity"  of  that 
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pixel.  We  denote  these  indicator  vectors  of  all  the  pixels  in  a  source  image  as  its  activity 
map.  A  decision  map  is  then  obtained  by  applying  an  information  theoretic  divergence 
to  measure  all  the  source  activity  maps.  To  make  a  reasonable  comparison  among  activ¬ 
ity  indicator  vectors,  we  apply  our  newly  proposed  divergence  measure,  Jensen-Renyi 
divergence,  which  is  defined  in  terms  of  Renyi  entropy  [5].  Wavelet  coefficients  of  the 
fused  image  are  finally  selected  according  to  the  decision  map.  Since  all  the  scales,  from 
fine  to  coarse,  are  considered  to  evaluate  the  activity  at  a  particular  position  within  an 
image,  our  approach  is  intrinsically  more  accurate,  in  the  sense  of  selecting  coefficients 
containing  the  richest  information,  relative  to  the  "select  max"  type  of  fusion  schemes. 
This  chapter  is  organized  as  follows.  We  briefly  introduce  a  biorthognal  wavelet  rep¬ 
resentation  of  images  in  Section  II.  A  concise  formulation  of  the  problem  is  given  in 
Section  III.  In  Section  IV,  we  describe  the  information  theoretic  fusion  algorithm  and 
present  some  numerical  experiments.  Finally,  we  provide  concluding  remarks  in  Sec¬ 
tion  V. 


6.2  Biorthogonal  Wavelet  Representation  of  Images 


Let  {d,  d}  {?/),  'ip}  be  two  dual  pairs  of  scaling  functions  and  wavelets  which  gener¬ 


ate  biorthogonal  wavelet  basis  of  L^(]R).  For  t  =  (ti,  ^2)  and  n  =  (ni,  712),  we  write 


(6.1) 

(6.2) 

(6.3) 


^^{t)  =  ^(fl)^(f2),  ^^(t)  =  1^(tl)d(f2),  ^^{t)  =  ^{tl)^{t2). 


Denote  for  1  <  A:  <  3, 


(6.4) 


(6.5) 


and 
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It  is  easy  to  verify  [70]  that 

and 

are  biorthogonal  bases  of 

Any  /  e  L^(E^)  has  two  possible  decompositions  in  these  bases, 

/  =  E  E  E  <  /■  =  E  E  E  <  /.  iin  >  «>-6) 

j  n  k=i  j  n  k=i 

where  <  •,  •  >  denotes  an  inner  product  of  two  functions.  Assuming  we  choose  as 
the  analysis  wavelets,  at  any  scale  2E  we  denote  the  approximation  coefficient 

aj[n]  =<  /,  (Pin  > 

and  the  wavelet  coefficient 

d^^[n]=<f,pjln>,  k  =  1,2,3. 

The  three  wavelets  extract  image  details  at  different  scales  and  orientations.  Over 
positive  frequencies,  (p  and  xp  have  an  energy  mainly  concentrated  respectively  on  lower 
and  higher  frequencies.  Let  a;  =  (wi,  102),  the  separable  wavelet  expressions  implies  that 

=  <^(Wi)'0(w2),  =  '0(Wl)'0(tJ2)-  (6.7) 

Hence  |v^^(a;)|  emphasizes  low  horizontal  frequencies  uji  and  high  vertical  frequencies 
(jj2,  while  \‘xp‘^{(j:)  \  is  larger  at  high  horizontal  frequencies  uji  and  low  vertical  frequencies 
ijj2r  whereas  |'0^(a;)|  is  larger  at  both  high  horizontal  frequencies  uji  and  high  vertical 
frequencies  U2.  As  a  result,  wavelet  coefficients  calculated  with  xp^  and  xp‘^  are  larger 
along  edges  which  are  respectively  horizontal  and  vertical,  and  xp'^  produces  large  coef¬ 
ficients  at  the  corners. 

The  wavelet  coefficients  at  scale  2^+^  are  calculated  from  with  two  dimensional  sep¬ 
arable  convolutions  and  subsamplings.  Let  {h,  g}  and  {h,  g}  be  the  perfect  reconstruc¬ 
tion  filters  associated  to  the  biorthogonal  wavelet  {xp,‘^'\.  For  any  pair  of  one-dimensional 
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Rows  Columns  Columns  Rows 


(a)  (b) 


Figure  6.2:  A  fast  biorthogonal  two-dimensional  wavelet  transform  (a)  and  its  inverse 
transform  (b)  implemented  by  perfect  reconstruction  filter  banks. 

filters  y[n\  and  z[n],  we  write  the  product  filter  yz[n\  =  y[ni\z[n2\,  and  denote  y[n\  = 
y[-n].  For  n  =  (ni,n2). 


«i+iW 

=  ttj  -k  hh[n], 

(6.8) 

=  ttj  *  hg[n], 

(6.9) 

=  ttj  -kgh[n], 

(6.10) 

d5+i[n] 

= 

(6.11) 

where  denotes  convolution.  A  separable  two  dimensional  convolution  can  be  fac¬ 
tored  into  one-dimensional  convolutions  along  with  rows  and  columns  of  the  image. 
The  factorization  is  illustrated  in  Figure  (6.2a).  The  rows  of  aj  are  first  convolved  with 
h  and  g,  and  subsampled  by  2.  The  columns  of  these  two  output  images  are  then  con¬ 
volved  respectively  with  h  and  g  and  subsampled,  which  yields  four  subsampled  im¬ 
ages  ttj+i,  and 

We  denote 

_  1  .r  n  I  ,  ni  =  2ki,n2  =  2k2 

|/[nj  =  |/[ni,n2j  =  < 

I  0  ,  otherwise 

the  image  obtained  by  inserting  a  row  of  zeros  and  a  column  of  zeros  between  pairs  of 
consecutive  rows  and  columns  of  y[ni,n2].  aj  is  recovered  from  the  coarser  scale  ap¬ 
proximation  ttj+i  and  the  wavelet  coefficients  dj+i,  dj+i  and  with  two-dimensional 
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separable  convolutions 

aj[n]  =  -k  hh[n]  +  -k  hg[n]  +  -kgh[n]  +  d^_^_i  -kgg[n].  (6.13) 

These  four  convolutions  can  also  be  factored  into  six  groups  of  one-dimensional  con¬ 
volutions  along  rows  and  columns,  as  illustrated  in  Figure  (6.2b). 

Let  aj[n]  be  a  digital  image  whose  pixel  interval  equals  to  2'^  =  N~^.  A  biorthogonal 
wavelet  image  representation  of  aj  of  depth  L  —  J  is  computed  by  iterating  Equation 
(6.8-6.11)  for  J  <3  <L: 

{d^j,d^,d^,aL}j<j<L-  (6.14) 

The  original  digital  image  a  j  is  recovered  from  this  wavelet  representation  by  iterating 
the  reconstruction  Equation  (6.13)  for  J  <3  <L. 


6.3  Problem  Formulation 


Let  fi,  f 2,- fn  ■  ^  M  be  digital  images  of  the  same  scene  taken  from  different  sen¬ 

sors.  For  the  pixel  level  image  fusion  problem,  we  assume  all  the  source  images  are 
registered  so  that  the  difference  in  resolution,  coverage,  treatment  of  a  theme,  char¬ 
acteristics  of  the  image  acquisition  methods  are  eliminated.  The  goal  of  our  fusion 
algorithm  is  to  construct  a  composite  image  such  that  information  captured  in  all  the 
source  images  are  combined  and  the  source  image  data  is  hence  compressed.  To  achieve 
this  goal,  we  apply  an  information  theoretic  fusion  approach  based  on  a  biorthogonal 
wavelet  representation. 


Definition  6.1  Let  W  fi  =  {dl{j,n),d‘f{j,n),dl{j,n),ai{L,n)}o^j<L^nezh  *  =  l,‘2,...nbe 
a  biorthogonal  wavelet  image  representation  of  fi  as  defined  in  equation  (6.14).  With  no  loss  of 
generality,  we  set  J  =  0.  For  any  n  G  Z^,  an  activity  pattern  vector  is  defined  as 


A{n)  = 


^14(1,2^  ^n)|^^|4(2,2^  ^n)\f...,'^\d^{L,n) 


.k=l 


k=l 


k=l 


(6.15) 
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which  is  a  {1  X  L)  vector  of  energy  concentrated  at  pixel  fi(2^n)  across  scales.  We  refer  to 
{Ai{n)}nez'^  the  activity  map  of  source  image  fi,  i  =  1,2,  ...n. 

Activity  maps  characterize  the  inherent  information  pattern  in  source  images.  To  fuse 
the  source  wavelet  coefficients,  it  is  necessary  to  compare  the  activity  patterns  for  every 
pixel.  For  instance,  if  the  activity  patterns  are  different  in  some  region,  taking  the  av¬ 
erage  of  wavelet  coefficients  to  generate  a  composite  image  is  not  a  good  choice,  since 
it  would  create  artifacts.  On  the  other  hand,  if  the  activity  patterns  are  similar  for  that 
region,  taking  the  average  would  inject  more  information  to  the  fused  image  due  to  the 
contribution  from  different  sources. 

A  reasonable  measure  for  activity  patterns  should  satisfy  the  following  properties: 

•  It  should  be  capable  of  measuring  the  difference  between  two  or  more  activity 
patterns. 

•  It  should  be  nonnegative  and  symmetric. 

•  It  should  vanish  to  zero  if  and  only  if  the  activity  patterns  are  the  same. 

•  It  should  reach  the  maximum  value  when  activity  patterns  are  degenerate  distri¬ 


butions. 


In  Chapter  3,  we  proposed  a  new  information  divergence  measure,  referred  to  as  Jensen- 
Renyi  divergence,  which  satisfies  the  above  requirements.  A  decision  map  is  then  gen¬ 
erated  by  applying  the  Jensen-Renyi  divergence  to  measure  the  coherence  of  source 
activity  maps  at  the  pixel  level.  We  further  segment  the  decision  map  into  two  regions, 
Dq  and  Di.  Dq  is  the  set  of  pixels  whose  activity  patterns  are  similar  in  all  the  source 
images,  while  Di  is  the  set  of  pixels  whose  activity  patterns  are  different.  Our  fusion 
scheme  is  to  find  the  solution  to  the  following  optimization  problem. 


W  f  =  arff  min 


E  E  E  iH'7o»-»7.an)N -E  E 


*=i  \t=i  2jneDo 


L 


j=i  2Jne-Di 


L 


(6.16) 


6.4.  IMAGE  FUSION  WITH  JENSEN-RENYI  DIVERGENCE 


119 


where  T  is  the  set  of  all  the  images  /  whose  wavelet  transform  satisfies 


mm{W fi{j,nj)  <  Wf{j,n)  <  max{W fi{j,n)), 


for  any  0  <  j  <  L  and  n  G  This  constraint  makes  sure  that  the  solution  stays  in  the 


closure  of  T,  i.e.,  no  image  outside  the  scenario  we  are  contemplating. 

6.4  Image  Fusion  with  Jensen-Renyi  divergence 

The  goal  of  image  fusion  is  to  integrate  complementary  information  from  multi-sensor 
data  such  that  the  fused  images  are  more  suitable  for  the  purpose  of  human  visual  per¬ 
ception.  Let  fi,  f 2,  ■■■fn  ■  ^  E  be  the  digital  images  generated  by  different  sensors. 

Our  information  theoretic  fusion  approach  first  calculates  a  biorthogonal  wavelet  im¬ 
age  representation  for  each  fi,  then  a  pixel  level  activity  map  is  formed,  as 

described  in  Section  III.  Denote  ||  •  ||  as  /i  norm,  for  any  n  G  Z^,  we  define  a  normalized 
activity  pattern 


where  Ai  =  [1,0,  ...0]  is  a  (1  x  L)  degenerate  distribution.  To  fuse  the  source  wavelet 
coefficients,  we  compare  the  normalized  activity  patterns  of  all  the  source  images  in 
terms  of  Jensen-Renyi  divergence,  and  create  a  selection  map  {S'(n)}nez2  : 


S{n)  =  JR^  (Pi(n),  • 


(6.17) 


The  selection  map  is  further  segmented  into  two  decision  regions,  Dq  and  Di.  Set  T  to 
be  the  mean  value  of  selection  map,  we  write 


Do  =  {n  G  Z^  :  S([2-^nJ)  <  T) 


and 


A  =  {n  G  Z^  :  S([2-^nJ)  >  T}, 
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where  [xj  denotes  the  integer  part  of  x. 

Since  Uq  n  =  0,  the  solution  to  Equation  (6.16)  can  be  obtained  by  searching  for  each 
wavelet  coefficient  of  the  fused  image  individually.  Let  /  be  a  composite  image  defined 
by  its  wavelet  coefficients 


n),  dl{j,  n),  dj{j,  n),  af{L,  n)}o<j<L,nez2, 


where 


2^n  e  Do 
2^n  e  Di 


and  for  A;  =  1,  2,  3, 


It  is  thus  easy  to  verify  that  /  satisfies  our  fusion  criteria  (6.16). 

Four  examples,  including  multi-sensor  navigation  image  fusion,  multi-modality  med¬ 
ical  image  fusion,  multi-spectral  remote  sensing  image  fusion,  and  multi-focus  optical 
image  fusion  are  now  presented  to  illustrate  the  fusion  scheme  defined  above. 

6.4.1  Multi-Sensor  Image  Fusion 

To  help  helicopter  pilots  navigate  under  poor  visibility  conditions,  such  as  fog  or  heavy 
rain,  helicopters  are  equipped  with  several  imaging  sensors,  which  can  be  viewed  by 
the  pilot  in  a  helmet  mounted  display.  A  typical  sensor  suite  includes  both  a  low-light- 
television  (LLTV)  sensor  and  a  thermal  imaging  forward-looking-infrared  (FLIR)  sen¬ 
sor.  In  the  current  configuration,  the  pilot  can  choose  one  of  the  two  sensors  to  watch 
in  his  display.  Sample  LLTV  and  FLIR  images  are  shown  in  Figure  (6.3.1)  and  (6.3.2) 
respectively.  A  possible  improvement  is  to  combine  both  imaging  sources  into  a  single 
fused  image. 

Image  fusion  by  standard  techniques  such  as  pixel  averaging  and  multiscale  based 
maximum  selection  scheme  are  shown  in  Figure  (6.3.3)  and  (6.3.4)  respectively.  Note 
that  the  pixel  averaging  method  has  a  muddy  appearance.  This  is  due  primarily  to  the 
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fact  that  averaging  results  in  reduced  contrast  for  all  the  patterns  which  appear  in  only 
one  source.  On  the  other  hand,  maximum  selection  scheme  produces  some  mosaic  like 
artifacts  due  to  the  high  frequency  noise  introduced  by  a  sudden  switch  between  two 
sets  of  source  wavelet  coefficients.  Image  fusion  with  our  multiscale  information  theo¬ 
retic  approach  is  illustrated  in  Figure  (6.3.6).  As  we  can  see,  all  the  significant  features 
from  both  sources  are  retained  in  the  fused  image  without  suffering  from  artifacts. 


6.4.2  Multi-Modality  Image  Fusion 

With  the  development  of  new  imaging  methods  in  medical  diagnostics  arises  the  need 
of  meaningful  and  spatial  correct  combination  of  all  available  image  datasets.  Examples 
for  imaging  devices  include  computer  tomography  (CT),  magnetic  resonance  imaging 
(MRI)  or  the  newer  positron  emission  tomography  (PET).  Image  fusion  of  a  CT  (Fig¬ 
ure  6.4.1)  and  a  MRI  (Figure  6.4.2)  image  with  our  multiscale  information  theoretic 
approach  is  illustrated  in  Figure  (6.4.6).  For  comparison  purpose,  fusion  by  pixel  aver¬ 
aging  and  by  multiscale  based  maximum  selection  scheme  are  shown  in  Figures  (6.4.3) 
and  (6.4.4). 


6.4.3  Multi-Spectral  Image  Fusion 

Image  fusion  is  often  involved  in  remote  sensing:  modern  spectral  sensors  gather  up 
to  several  hundreds  of  spectral  bands  which  can  be  either  visualized  and  processed  in¬ 
dividually,  or  can  be  fused  into  a  single  image,  depending  on  the  image  analysis  task. 
Image  fusion  of  two  bands  from  a  multispectral  sensor  with  our  multiscale  informa¬ 
tion  theoretic  approach  is  illustrated  in  Figure  (6.5).  For  comparison  purpose,  fusion 
by  pixel  averaging  and  by  multiscale  based  maximum  selection  scheme  are  shown  in 
Figures  (6.5.3)  and  (6.5.4). 
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6.4.4  Multi-Focus  Image  Fusion 

Due  to  the  limited  depth-of-focus  of  optical  lenses,  it  is  often  impossible  to  get  an  image 
which  contains  all  relevant  objects  'in  focus'.  One  possibility  to  overcome  this  problem 
is  to  take  several  pictures  with  different  focus  points  and  combine  them  together  into 
a  single  frame  which  finally  contains  the  focused  regions  of  all  input  images.  Figure 
(6.6)  illustrates  our  multiscale  information  theoretic  fusion  approach.  For  comparison 
purpose,  fusion  by  pixel  averaging  and  by  multiscale  based  maximum  selection  scheme 
are  shown  in  Figures  (6.6.3)  and  (6.6.4). 

6.4.5  Performance  Measures 

It  is  difficult  to  define  a  general  performance  measure  for  fusion  algorithms.  Some  per¬ 
formance  metrics  which  are  widely  used  in  signal  and  image  processing  do  not  fit  the 
application  of  image  fusion.  One  such  example  is  mean  square  error.  Let  /i,  /2, ...,  /„  : 
[1,A]  X  [1,M]  ^  Mbe  digital  images  of  the  same  scene  taken  from  different  sensors  and 
/  be  the  fusion  result.  A  cost  function  characterizing  the  mean  square  error  between 
the  fused  image  and  source  inputs  may  be  defined  as 

1  ” 

(«8) 

n  ^ ^ 

1=1 

where  ||  •  ||  denote  I2  norm.  It  is  easy  to  find  out  that  fusion  by  pixel  averaging,  i.e. 

1  ” 

h  =  (6.19) 

i=l 

minimizes  this  cost.  However,  a  pixel  averaging  method  is  not  accepted  as  the  best 
fusion  scheme  [23].  It  is  noted  earlier  that  pixel  averaging  has  a  muddy  appearance. 
This  is  due  primarily  to  the  fact  that  averaging  results  in  reduced  contrast  for  all  the 
patterns  which  appear  in  only  one  source  and  a  mixture  of  different  patterns  which 
come  from  different  sources. 

Another  possible  candidate  of  performance  measure  for  image  fusion  is  the  correlation 
between  the  fused  image  and  all  the  source  inputs.  The  higher  correlation,  the  better 
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C(/l,/2) 

CiflJa) 

Cif2ja) 

CiflJm) 

CiflJo) 

C{f2,fo) 

C{hJa)+C{hJa) 

C{hJm)+C{f2jm) 

C{flJo)+C{f2jo) 

Multi- 

0.0498 

0.5752 

0.8457 

0.4281 

0.8703 

0.4867 

0.8405 

modality 

1.4208 

1.2985 

1.3272 

Multi- 

-0.0739 

0.5790 

0.7703 

0.7928 

0.3991 

0.7663 

0.4455 

sensor 

1.3493 

1.1919 

1.2118 

Multi- 

0.8089 

0.9501 

0.9519 

0.8959 

0.9345 

0.9231 

0.9332 

spectral 

1.9020 

1.8303 

1.8563 

Multi- 

0.9542 

0.9884 

0.9886 

0.9791 

0.9766 

0.9871 

0.9778 

focus 

1.9770 

1.9557 

1.9649 

Table  6.1:  Correlations  between  different  fusion  results  and  source  images,  /i,  /2  denote 
two  source  images,  fa,  fm,  fo  denote  fusion  outputs  by  pixel  averaging,  wavelet-based 
maximum  selection  and  the  proposed  method  respectively. 


performance.  Let  f,g:  [1,  A]  x  [1,  M]  ^  M  denote  two  digital  images.  The  correlation 
between  /  and  g  is  defined  as 


C{f,g) 


\/Ef=i  E“i(/(b  J)  -  If  Ef=i  J)  - 


(6.20) 


where  /,  g  stands  for  the  mean  value  of  /,  g  respectively.  Then  a  performance  metric 
based  on  correlation  may  be  defined  as 


p  =  Y,\C(fJi)\.  (6.21) 

i=l 

where  /  is  the  fusion  output  and  //s  are  source  images.  However,  maximizing  correla¬ 
tion  is  closely  related  to  minimizing  the  mean  square  error.  Fusion  by  pixel  averaging 
usually  maximizes  the  overall  correlation  given  by  Equation  (6.21).  Table  (6.1)  lists  the 
correlations  between  different  fusion  outputs  and  source  images  in  the  experiments  of 
multi-sensor,  multi-modality,  multi-spectral  and  multi-focus  image  fusion.  Following 
the  same  argument  for  the  cost  function  of  mean  square  error,  we  conclude  that  corre¬ 
lation  is  also  not  a  good  choice  to  measure  the  performance  of  image  fusion. 
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If  the  "ground  truth"  of  the  fusion  result  is  known,  we  can  perform  a  quantitative  per¬ 
formance  measure  to  compare  different  fusion  algorithms.  For  the  above  experiment  of 
multi-focus  fusion,  an  ideal  image  should  contain  both  well  focused  clocks  and  it  may 
be  constructed  manually  by  cut  and  paste,  as  demonstrated  in  Figure  (6.6.5). 

Let  fa,  /  :  [1,  A]  X  [1,  M]  ^  M  denote  the  ideal  image  and  fusion  result  respectively.  A 
performance  measure  p  can  be  defined  as  the  standard  deviation  of  the  error  between 
fd  and  /, 


P 


NM 


(6.22) 


Table  (6.2)  summarizes  the  standard  deviation  between  an  ideal  image  and  fusion  re¬ 
sults  by  different  algorithms.  The  proposed  information  theoretic  approach  clearly  gen¬ 
erates  a  fusion  result  that  is  the  closest  to  the  ideal  image  among  the  outputs  of  the  pixel 
averaging  and  wavelet-based  maximum  selection  scheme. 

It  has  to  be  pointed  out  that  this  method  is  restricted  to  specially  constructed  images 
and  generally  not  applicable  to  real  multi-sensor  data  where  the  ideal  fusion  is  ill- 
defined  and  caimot  be  obtained.  There  is  no  general  quantitative  performance  measure 
for  image  fusion  algorithms  in  the  current  literature  except  for  specific  applications  [96]. 
The  fusion  results  are  mostly  evaluated  visually. 


pixel  averaging 

maximum  selection 

proposed  algorithm 

p 

7.816 

10.794 

6.573 

Table  6.2:  Standard  deviation  between  an  ideal  image  and  fusion  results  by  different 
algorithms. 


6.5  Conclusion 

In  this  chapter,  we  derive  a  new  multiscale  image  fusion  algorithm,  which  aims  at  in¬ 
tegrating  complementary  information  from  multi-sensor  data  so  that  the  fused  images 
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are  more  suitable  for  visual  perception.  We  formulate  the  image  fusion  as  an  optimiza¬ 
tion  problem  to  which  we  propose  a  solution. 

As  a  first  step,  a  biorthogonal  wavelet  transform  of  each  source  image  is  calculated  to 
generate  a  scale  space  representation.  Biorthogonal  wavelets  can  be  synthesized  with 
perfect  reconstruction  filters  having  a  linear  phase,  which  is  a  desirable  property  for 
image  fusion  applications. 

In  contrast  to  the  "choose  max"  type  of  selection  rules,  our  proposed  technique  relies 
on  the  intrinsic  statistical  structure.  Using  spatially  specific  wavelet  coefficients  from 
fine  to  coarse  scales,  we  construct  activity  pattern  vectors  which  we  compare  using  a 
new  Jensen-Renyi  divergence.  The  resulting  decision  map  makes  our  approach  more 
effective  in  preserving  significant  features  from  all  the  sources  without  suffering  from 
artifacts. 

We  have  successfully  tested  the  new  scheme  on  fusion  of  multi-sensor  (low-light-television 
and  forward-looking-infrared),  multi-modality  (CT  and  MRI),  multi-spectral,  and  multi¬ 
focus  images.  Quantitative  performance  measure  for  fusion  of  synthetic  test  images 
and  visual  evaluation  for  real  multi-sensor  image  fusion  demonstrates  that  the  pre¬ 
sented  algorithm  clearly  outperforms  pixel  averaging  and  wavelet  based  maximum 
selection  fusion  schemes. 
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Figure  6.3:  Multi-sensor  image  fusion:  (1)  a  low-light-television  sensor  image;  (2)  a 
forward-looking-infrared  image;  (3)  fusion  by  averaging;  (4)  fusion  by  wavelet  based 
maximum  selection  scheme;  (5)  a  selection  map;  (6)  fusion  by  the  proposed  information 
theoretic  approach. 
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Figure  6.4:  Multi-modality  image  fusion:  (1)  a  CT  image;  (2)  a  MRI  image;  (3)  fusion  by 
averaging;  (4)  fusion  by  wavelet  based  maximum  selection  scheme;  (5)  a  selection  map; 
(6)  fusion  by  the  proposed  information  theoretic  approach. 
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Figure  6.5:  Multi-spectral  image  fusion:  (1)  a  high  resolution  remote  sensing  image;  (2) 
a  low  resolution  remote  sensing  image;  (3)  fusion  by  averaging;  (4)  fusion  by  wavelet 
based  maximum  selection  scheme;  (5)  a  selection  map;  (6)  fusion  by  the  proposed  in¬ 
formation  theoretic  approach. 
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Figure  6.6:  Multi-focus  image  fusion:  (1)  an  image  focused  on  the  larger  clock;  (2)  an 
image  focused  on  the  smaller  clock;  (3)  fusion  by  averaging;  (4)  fusion  by  wavelet  based 
maximum  selection  scheme;  (5)  a  perfectly  fused  image  obtained  by  manually  cut  and 
paste;  (6)  fusion  by  the  proposed  information  theoretic  approach. 
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Chapter 

0 _ 

■  Shape  Recognition 


The  geometrical  description  of  an  object  can  be  decomposed  into  registration  and 
shape  information.  For  example,  an  object's  location,  rotation  and  size  could  be 
the  registration  information  and  the  geometrical  information  that  remains  is  the  shape 
of  the  object.  An  object's  shape  is  invariant  under  registration  transformations  and  two 
objects  have  the  same  shape  if  they  can  be  registered  to  match  exactly. 

The  pioneers  of  this  topic  of  general  shape  and  registration  analysis  are  Kendall  [30] 
and  Bookstein  [31].  Some  reference  and  reviews  include  Goodall  [32],  Kent  [33],  Dry- 
den  and  Mardia  [34]. 

The  definition  of  general  shape  spaces  and  shape  distance  is  introduced  in  Section  7.1. 
In  Section  7.2  we  describe  the  matching  of  two  configurations  under  Euclidean  simi¬ 
larity  transformations.  In  Section  7.3  we  briefly  describe  the  matching  of  two  config¬ 
urations  under  affine  transformations.  We  extend  the  work  to  consider  the  robustness 
issues  and  matching  by  estimation  in  Section  7.4. 
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7.1  Shape  Space  and  Shape  Distance 

In  this  section,  we  investigate  further  geometrical  aspects  of  shape.  Firsf  we  infroduce 
fhe  concepts  of  a  shape  configuration  and  of  a  pre-shape  space. 

Definition  7.1  A  configuration  X  consists  of  k  labeled  points  in  a  m-dimensional  Euclidean 
space. 

Two  configurations,  X  and  Y,  have  the  same  general  shape  iiY  =  g{X)^  g  ^  G.G  is  the 
set  of  Euclidean  fransformafions,  including  franslation,  rofafion  and  isofropic  scaling, 
which  is  given  by 

Y  =  rXT  +  IkC^  :  r  e  M+ ,  r  e  SO(m),  c  e  M™  (7.1) 

where  c  is  a  franslation  m-vector,  Ik  is  k-vector  of  ones,  T  is  a  m  x  m  orfhogonal  mafrix 
wifh  det(r)  =  1  and  r  >  0  denotes  an  isotropic  scaling. 

In  order  to  represent  shape,  it  is  convenient  to  remove  similarity  transforms  one  at  a 
time.  Translation  is  the  easiest  to  filter  out  of  X  and  can  be  achieved  following  ifs 
producf  by  a  Helmerf  sub-matrix. 

Xh  =  hx  e  (7.2) 

A  Helmert  sub-matrix  H  is  a  {k  —  1)  x  k  Helmert  matrix  without  its  first  row.  A  full 
Helmerf  matrix  is  a  square  k  x  k  orthogonal  matrix  with  its  first  row  element  equal 
to  1  / Vk,  and  the  remaining  rows  are  orthogonal  to  the  first  row.  We  drop  the  first  row 
of  so  fhaf  the  transformed  HX  does  not  depend  on  the  original  locations  of  the 
configuration. 

Definition  7.2  The  row  of  the  Helmert  sub-matrix  H  is  given  by 

{hj,  hj,  ...hj, -jhj,  0,  ...0),  hj  =  -{j{j  +  1))“^/^  (7.3) 

so  that  the  row  consists  of  hj  repeated  j  times,  followed  by  —jhj  and  then  k  —  j  —  I  zeros, 
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For  example,  a  full  Helmert  matrix  for  A:  =  3  is 


1/a/3  1/V3  1/a/3 

-1/a/2  l/y/2  0 

-1/a/6  -1/VG  2/v^ 


while  its  corresponding  Helmert  sub-matrix  is 


-1/a/2  1/v^  0 

-1/a/6  -1/v^  2/V6 


The  pre-shape  of  a  configuration  matrix  X  has  all  the  information  about  location  and 
scale  removed. 


Definition  7.3  The  pre-shape  of  a  configuration  matrix  X  is  given  by 


Xh  HX 
Xh  II  “  II  HX 


(7.4) 


where  H  is  a  Helmert  sub-matrix.  The  pre-shape  is  invariant  under  translation  and  scaling  of 
the  original  configuration. 


Notice  that  a  pre-shape  of  configuration  X  has  a  dimension  of  {k  —  l)m  —  1.  In  order 
to  remove  rotation  information  from  the  configuration,  we  identify  all  rotated  versions 
of  the  pre-shape  with  each  other,  and  this  equivalence  class  is  the  shape  of  X.  An 
alternative  definition  of  the  shape  of  X  is  given  by  the  following  definition. 

Definition  7.4  The  shape  of  a  configuration  matrix  X  is  all  the  geometrical  information  about 
X  that  is  invariant  under  Euclidean  similarity  transformations,  i.e.,  location,  rotation  and 
isotropic  scaling.  The  shape  can  be  represented  by  the  set  [X]  given  by 

[X]  =  {ZT:T  e  SO(m)}  (7.5) 

where  SO{m)  is  the  orthogonal  group  of  rotations  and  Z  is  the  pre-shape  of  X. 
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Definition  7.5  The  shape  space  [97]  is  the  set  of  all  the  possible  shapes.  Formally,  the  shape 
space  Y^^rn  space  of  the  non-coincident  k-point  set  configurations  in  E™  under  the 

action  of  the  Euclidean  similarity  transformations. 


A  suitable  distance  in  YT  Procrustean  distance.  Consider  two  configuration 

matrices  Xi  and  X2  with  pre-shapes  Zi  and  Z2,  we  minimize  over  rotations  and  scale 
to  find  the  closest  Euclidean  distance  between  Zi  and  Z2. 

Definition  7.6  The  full  Procrustean  distance  [34]  between  Xi  and  X2  is  given  by 

df(Xi,X2)=  inf  \\Z2-I3Z^T\\  (7.6) 

reso(m),/3eM 

It  can  be  shown  [30]  that 

d^(Xi,X2)  =  inf  \\Z2-I5Z^T\\ 

rGSO(m),/3eM 

=  a/1  —  [trace(A)]2 

=  sinp.  (7.7) 

where  p  is  a  Riemannian  metric,  which  is  given  by 

p{Zi,Z2)  =  arccos(trace(A)),  (7.8) 

where  the  matrix  A  is  the  diagonal  m  x  m  matrix  with  the  positive  elements  given  by 
the  square  root  of  the  eigenvalues  of  Z2Zf  Zi,  except  the  smallest  diagonal  element 
which  is  negative  if  we  have  det{Zf  Zf)  <  0. 

Another  general  shape  space  of  interest  is  the  affine  shape  space.  The  set  of  affine 
transformations  of  a  configuration  X{k  x  m  matrix)  is  given  by 

{XA-Plkc^  :  A  e  GL(m)}, 

where  A  is  a  m  x  m  matrix  in  the  general  linear  group  of  invertible  m  x  m  matrices 
GL(m),  and  c  is  a  translation  m-vector.  Note  it  A:  <  m  -I-  1  all  the  configurations  have 
the  same  affine  shape,  so  we  require  k  >  m  1  for  non-trivial  affine  shape. 
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Definition  7.7  The  affine  shape  space  is  the  orbit  space  of  the  non-coincident  k-point  set  con¬ 
figurations  in  M"*  under  the  action  of  Euclidean  affine  transformations. 

A  suitable  distance  in  the  affine  shape  space  between  Xi  and  X2  is  given  by 


\\X,{XfX,)-y^-{X2A  +  lkC^)\\ 


dA{Xi,X2) 


(7.9) 


inf 

aGGL(m),ceM 


The  post-multiplication  of  Xi  by  (XfXi)  ensures  that  dA{Xi,  X2)  =  dA{X2,  Ai). 

7.2  Euclidean  Shape  Matching 

Consider  configuration  of  A;  >  m  -t  1  points  in  m  dimension.  To  match  two  configura¬ 
tions  Y  and  T  (both  k  x  m  matrices),  we  have 


r  =  rTT  +  IkC^  ^E  =  XB^E 


(7.10) 


where  c  e  M™  is  a  translation  m-vector.  Ijt  is  the  A; -vector  ones,  T  is  a  m  x  m  orthog¬ 
onal  matrix  with  det(r)  =  1  and  r  >  0  denotes  an  isotropic  scaling.  E  is  the  k  x  m 
error  matrix,  which  is  assumed  to  be  modeled  by  an  independent  multivariate  normal 
distribution.  Let  X  =  [1^,  T]  be  the  design  matrix,  and  B  =  [c,  rT]^  be  the  regression 
parameters.  The  most  straightforward  approach  to  estimating  the  regression  parame¬ 
ters  is  by  a  least  squares  approach,  i.e.,  minimizing  s‘^{E)  =  \\E\f  =  trace (£'^£').  To 
simplify  calculation,  we  first  obtain  the  pre-shapes  of  Y  and  T,  denoted  by  Y  and  T 
respectively.  In  this  way,  c  is  set  to  be  0.  Then  we  have 


diAy.T)  =  4.Ay.T) 

=  n  -  rfrfiY  -  rfT)) 


rGSO(m),r 


Consider  a  singular  value  decomposition  of  Y^f  given  by 


(7.11) 


=  VXU^ 
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Figure  7.1:  A  shape  database  in  experiment 


where  y  G  SO(m),  A  =  diag{Xi,  X2, X^)  with  A 1  >  A2,...,>  A^-i  >  |Am|-  Note 
that  Am  <  0  if  and  only  if  det(F^T)  <  0.  Hence  the  last  term  in  the  right-hand  side  of 
Equation  (7.11)  is  equivalent  to 

m 

—  2r  sup  trace  (FA)  = —2r  sup  ^FjjAj, 
reso(m)  reso(m) 

where  (Fn,  r22,  •••,  Fmm)  are  the  diagonals  of  F.  Note  that  the  set  of  diagonals  of  F  in 
SO(m)  is  a  compact  convex  set  [98]  with  extreme  points 

{(±1,±1,...,±1)} 

with  an  even  number  of  minus  signs.  Hence  it  is  clear  in  our  case  that  the  supremum  is 
achieved  when  T'a  =  l,i  =  1,  2, ...,  m.  Therefore, 

=  inf  nil'll^  +  rlf||2  -  .  (7.12) 

By  differentiation  we  obtain 

m 

f  =  ^A,. 

2=1 


(7.13) 
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It  is  now  clear  the  minimizing  rotation  matrix  is  given  by  f  =  UV^  since 

trace(y^Tf)  =  trace{V  AU^UV^)  =  trace(A). 

Based  on  the  above  derivation,  a  suitable  symmetric  residual  discrepancy  measure  in¬ 
volves  the  matching  of  the  pre-shapes  under  Euclidean  similarity  transformations  and 
is  given  by 

dGF{Y,T)  =  inf  ||F  -  [l,,f][c,rr]^||  =  ^(1  -  (trace(A))2).  (7.14) 

Referring  to  equation  (7.7),  we  can  see  that  the  symmetric  residual  discrepancy  mea¬ 
sure  corresponds  to  a  full  Procrustean  distance  in  shape  space. 

An  aircraft  database  of  18  templates  and  a  target  configuration  we  extracted  from  an 
ISAR  image  are  shown  in  Figure  (7.1).  The  target  configuration  can  be  modeled  by  a 
translation  of  MIG-25  in  the  database  plus  some  distortions.  We  center  and  normalize 
it  to  a  pre-shape  format.  Symmetric  discrepancy  measures  between  the  target  and  tem¬ 
plates  in  database  are  calculated  and  listed  in  the  Table  (7.1).  As  we  can  see,  the  shape 
distance  between  the  target  and  MIG-25  has  the  minimal  value. 


A-4 

A-7 

A-10 

B-13 

B-52 

Buccaneer 

dGF{Y,T) 

0.3416 

0.3460 

0.4339 

0.2776 

0.5492 

0.2645 

F-111 

F-4 

MIG-25 

Jaguar 

MIG-27 

Mirage-IV 

dGF{Y,T) 

0.3653 

0.3336 

0.0501 

0.2930 

0.2737 

0.3837 

Su-24 

Su-7 

Tu-22 

Tu-26 

Viggen 

Harrier 

dGF{Y,T) 

0.2425 

0.3197 

0.2491 

0.2408 

0.4086 

0.3086 

Table  7.1:  Symmetric  discrepancy  measures  between  the  target  and  aircraft  templates. 


7.3  Affine  Shape  Matching 

For  k  >  m  1  points  in  E™,  the  set  of  transformations  G  is  the  group  of  affine  trans¬ 
formations.  Matching  two  configurations  Y  and  T  (both  k  x  m  matrices),  we  have  a 
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(m  +  1)  X  m  affine  transformation  matrix  B  such  that 

Y  =  [lk,T]B  +  E  =  XB  +  E  (7.15) 

where  E  is  the  k  x  m  error  matrix,  which  is  assumed  to  be  modeled  by  an  independent 
multivariate  normal  distribution.  X  =  [lk,T]  is  the  design  matrix.  The  least  square 
solution  is  given  by 

B  =  {X'^Xy^X'^Y,  (7.16) 

where  we  assume  the  rank  of  X  to  be  m  +  1.  A  symmetric  discrepancy  measure  can  be 
obtained  by 

dGA{Y,T)  =  \\Y{Y^Y)-Y^  -  X{X^X)-^X^Y\\.  (7.17) 

This  measure  has  the  property  that 

dGA{Y,T)  =  dGA{Y,TA+lkby  =  dGA{T,Y)  =  dGA{YA+lkb^,T) 
assuming  T,  Y  are  of  rank  m,  A  is  a  full  rank  m  x  m  matrix  and  6  is  a  m-vector. 


7.4  Estimation  by  Matching 


If  we  have  N  random  samples  of  a  object,  Tj  e  j  =  1,  2,  ...X,  then  it  is  of  interest 
to  obtain  an  average  configuration  fi  {a  k  x  m  matrix)  and  to  explore  the  structure  of 
viability,  up  to  invariance  in  the  set  of  transformations  G. 

An  estimate  of  the  population  mean  configuration  //  up  to  invariance  in  G,  denoted  by 
jl,  can  be  obtained  by  simultaneously  matching  each  to  //  (j  =  1,  2,  ...X)  and  choosing 
ji  as  a  suitable  estimate  (subject  to  certain  constraints  on  //).  In  particular,  fi  is  obtained 
from  the  constrained  minimization 


N 


fl  = 


arg  inf  ^  inf  (j){s{gj{Tj)  -  //)) 

n  f  *  O' 


A*  “  9j 
J=1 


(7.18) 


where  (f){x)  is  a  penalty  function  on  M+,  s{E)  is  the  objective  function  for  matching  two 
configurations,  and  in  general  restrictions  need  to  be  imposed  on  g  to  avoid  degener¬ 
acy.  A  common  choice  of  estimator  is  (f){x)  =  x"^,  i.e.,  a  least  square  choice. 
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For  the  generalized  shape  matching  in  two  dimensions,  given  N  random  sample  con¬ 
figurations  in  the  complex  plane  Ti  , Tff,  {Tj  e  C^,  complex  A: -vectors),  we  wish  to 
obtain  an  average  shape,  where  the  set  of  transformations  G  is  taken  to  be  the  Eu¬ 
clidean  similarity  group.  Upon  selecting  (f){x)  =  for  a  least  square  matching,  we  let 
Xj  =  [Ijt,  Tj],j  =  1,  2, A  be  a  A;  x  2  design  matrix,  and  define 


U  ~  F 


where  Bj  are  the  2x1  complex  parameters  for  matching  the  configuration  to  a  com¬ 
plex  mean  //  (a  A:  x  1  complex  vector)  with  //  restricted  to  be  centered  =  0)  and 

having  unit  size  (//*//  =  1).  "  *  "  denotes  complex  conjugate,  ej  is  assumed  to  be  mod¬ 
eled  by  an  independent  multivariate  normal  distribution. 

The  N  configurations  can  hence  be  matched  using  a  least  square  criterion,  i.e.,  mini¬ 
mizing  the  function  ^j^*j  over  and  //.  Fixing  //,  the  solution  is  clearly  given 

by 


4- =  (j  =  l,2,...A) 


(7.19) 


where  X*Xj  is  assumed  of  rank  2,  i.e.,  the  points  in  Tj  are  not  all  coincident.  This  yields. 


e  —  {Ik 


(7.20) 


where  Hj  is  the  estimated  hat  matrix  for  Xj  and  given  by 


Consequently,  we  have  [34], 

fi  =  arginf  jT  Aji,  (7.21) 

where  A  =  “  ^j)-  Hence,  fi  is  given  by  a  complex  eigenvector  corresponding 

to  the  smallest  non-zero  eigenvalue  of  A,  which  is  equivalent  to  finding  an  eigenvector 
corresponding  to  the  largest  eigenvalue  of 

N 

s  =  Y,Tjf;, 


(7.22) 
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Figure  7.2:  A  set  of  16  noisy  observation  of  a  B-52  shape 


Figure  7.3:  Estimated  mean  configuration  of  a  B-52  shape 
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A-4 

A-7 

A-10 

B-13 

B-52 

Buccaneer 

dGF{Y,T) 

0.4792 

0.4091 

0.3661 

0.5675 

0.0081 

0.4249 

F-111 

F-4 

Harrier 

Jaguar 

MIG-27 

Mirage-IV 

dGF{Y,T) 

0.7045 

0.5488 

0.5173 

0.6030 

0.5850 

0.7195 

Su-24 

Su-7 

Tu-22 

Tu-26 

Viggen 

MIG-25 

dGF{Y,T) 

0.4754 

0.4936 

0.4604 

0.5477 

0.6974 

0.5494 

Table  7.2:  Shape  distances  between  the  estimated  mean  shape  and  templates  in  the 
database 


1^2 

b^4 

b^5 

be 

b^7 

dGFi^ij  T) 

0.0369 

0.0417 

0.0393 

0.0376 

0.0426 

0.0399 

0.0396 

0.0397 

bio 

bn 

b^i2 

bi3 

bj4 

bi5 

b^ie 

dGF(H,T) 

0.0389 

0.0385 

0.0389 

0.0408 

0.0398 

0.0398 

0.0392 

0.0421 

Table  7.3:  Shape  distances  between  noisy  configurations  to  the  B-52  template. 


where  fj  is  the  pre-shape  of  Tj.  The  shape  corresponding  to  this  eigenvector  is  hence 
the  unique  least  square  estimate  provided  that  there  is  a  single  greatest  eigenvalue.  The 
average  shape  from  this  procedure  is  called  the  full  procrustean  mean. 

Figure  (7.2)  shows  a  set  of  noisy  configurations  (128  x  1  complex  vectors)  of  B-52. 
The  estimated  mean  shape  calculated  by  Equation  (7.21)  is  shown  in  Figure  (7.3).  Table 
(7.2)  lists  the  shape  distances  between  the  estimated  mean  shape  and  templates  in  the 
aircraft  database.  Table  (7.3)  lists  the  shape  distances  between  noisy  configurations  to 
the  B-52  template.  It's  clear  that  matching  by  way  of  a  mean  shape  has  a  better  result. 
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Chapter 


cussions 


Conclusions  and  Dis- 


IN  this  chapter,  we  briefly  summarize  the  contributions  of  this  thesis  and  the  overall 
conclusions  which  can  be  derived  from  the  results  of  our  research.  We  also  present 
some  suggestions  for  extending  this  work. 


8.1  Conclusions 


The  main  contributions  of  fhis  thesis  can  be  categorized  as  follows: 


•  Information  theoretic  approach  for  ISAR  image  focusing  and  multiscale  data  fu¬ 
sion 

A  new  generalized  divergence  measure,  the  Jensen-Renyi  divergence,  is  proposed.  We 
prove  the  convexity  of  this  divergence  measure,  derive  its  maximum  value,  and  an¬ 
alyze  its  upper  bounds  in  terms  of  the  Bayes  error  in  statistical  pattern  recognition. 
Based  on  the  Jensen-Renyi  divergence,  we  propose  a  new  approach  to  the  problem  of 
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ISAR  image  registration.  This  is  accomplished  by  using  the  Jensen-Renyi  divergence  to 
measure  the  statistical  dependence  between  consecutive  ISAR  image  frames,  which  is 
maximal  if  the  images  are  geometrically  aligned.  Compared  to  the  mutual  information 
based  registration  techniques,  the  Jensen-Renyi  divergence  provides  an  ability  to  con¬ 
trol  the  measurement  sensitivity  of  the  joint  histogram.  This  flexibility  results  in  more 
robust  registration  in  the  presence  of  noise.  Maximization  of  the  Jensen-Renyi  diver¬ 
gence  is  a  very  general  criterion,  because  no  assumptions  are  made  in  regards  to  the 
nature  of  this  dependence  and  no  limiting  constraints  are  imposed  on  image  contents. 
Simulation  results  demonstrate  that  our  approach  achieves  an  effective  estimation  of 
the  target  motion  automatically  without  any  prior  feature  extraction.  Based  on  the  es¬ 
timated  motion  parameters,  translational  motion  compensation  (TMC),  and  rotational 
motion  compensation  (RMC)  can  be  used  to  generate  a  focused  image  of  the  target. 

In  order  to  integrate  complementary  information  from  multi-sensor  data  so  that  the 
fused  images  are  more  suitable  for  visual  perception  and  recognition,  we  derive  a  new 
information  theoretic  fusion  scheme  in  a  multiscale  framework.  We  formulate  the  im¬ 
age  fusion  as  an  optimization  problem  for  which  we  propose  a  solution.  We  have  suc¬ 
cessfully  tested  the  new  scheme  on  fusion  of  multi-sensor  (low-light-television  and 
forward-looking-infrared),  multi-modality  (CT  and  MRI),  multi-spectral,  and  multi¬ 
focus  images.  Quantitative  performance  measure  for  fusion  of  synthetic  test  images 
and  visual  evaluation  for  real  multi-sensor  image  fusion  demonstrates  that  the  pre¬ 
sented  algorithm  clearly  outperforms  pixel  averaging  and  wavelet  based  maximum 
selection  fusion  schemes. 


•  Multiscale  shape  enhancement  and  shape  analysis 

As  a  pre-processing  step  for  the  shapes  extracted  from  ISAR  images,  we  propose  a 
novel  non-linear  smoothness-constrained  filtering  technique.  The  key  idea  is  to  sepa¬ 
rate  the  signal  portion  from  its  measured  data,  and  to  preserve  the  original  smoothness 
property  of  the  underlying  shape.  Using  notations  of  Holder  spaces  and  Holder  expo¬ 
nent,  we  establish  results  of  signal  regularity  measurement  with  wavelets.  To  detect 


8.2.  SUGGESTIONS  FOR  FUTURE  RESEARCH 


145 


the  singular  points  of  signal  from  measured  data,  we  turned  to  curve  shortening  and 
derived  the  partial  differential  equations  that  characterize  the  evolution  of  curvature. 
A  new  singularity  detection  method  by  tracking  the  curvature  extrema  across  scales  is 
proposed  and  a  multiscale  curvature  mask  is  generated.  Then  we  proceed  to  project 
measured  data  into  the  wavelet  domain  and  suppress  wavelet  coefficients  by  this  mul¬ 
tiscale  curvature  mask.  For  a  piecewise  smooth  signal,  it  was  shown  that  filtering  by 
this  curvature  mask  is  equivalent  to  keeping  the  signal  pointwise  Holder  exponents  at 
the  singular  points  of  the  underlying  signal,  and  to  lifting  its  smoothness  at  all  the  re¬ 
maining  points. 

To  identify  a  shape  independently  of  ifs  registrafion  information,  we  finally  propose 
matching  two  configurations  by  regression,  using  notations  of  general  shape  spaces 
and  procrusfean  disfances.  In  particular,  we  study  the  generalized  Euclidean  and  affine 
matching  by  estimating  a  mean  configuration  in  two  dimensions.  Simulation  results 
show  that  matching  by  way  of  a  mean  configuration  is  more  robust  than  matching  tar¬ 
get  shapes  directly. 


8.2  Suggestions  for  future  research 

In  this  section,  we  provide  some  discussions  and  suggestions  for  extending  the  research 
presented  in  this  thesis. 

•  Optimal  choice  of  exponential  order  for  the  Jensen-Renyi  divergence 

A  large  fraction  of  our  effort  in  Chapter  3  was  focused  on  searching  for  a  spatial  trans¬ 
formation  such  that  a  similarity  metric  achieves  its  maximum  between  two  images 
taken  at  different  times,  from  differenf  sensors,  or  from  different  viewpoints.  We  pro¬ 
pose  a  general  framework  based  on  the  Jensen-Renyi  divergence  for  the  purpose  of 
image  registrafion  and  establish  the  optimal  choice  of  weighf  vecfor  oj.  For  the  selec¬ 
tion  of  exponential  order  a,  there  is  a  trade  off  between  optimality  and  practicality.  We 
conclude  that  a  =  0  is  best  for  the  ideal  case  of  registration,  if  one  can  exacfly  model 
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the  misalignment  between  two  images.  It  is,  however,  also  the  least  robust  selection,  as 
it  tends  to  make  all  the  p's  the  same  as  the  uniform  distribution,  if  is  not  degenerate 
distribution  and  pij  >  0.  Then  the  Jensen-Renyi  divergence  would  be  zero  for  the  whole 
transformation  parameter  space  as  in  case  where  the  adapted  transformation  group  can 
not  completely  model  the  misalignment.  On  the  other  hand,  o  =  1  is  the  most  robust 
choice,  in  spite  of  also  resulting  in  the  least  sharp  peak.  In  the  future  work,  I  would 
suggest  formulating  a  cost  function  over  a  and  solving  its  optimal  value  by  minimiz¬ 
ing  that  cost  function.  The  difficulty  is  to  find  such  a  cosf  function  that  it  is  convex  in  a 
and  also  reasonable  for  a  specific  application. 

•  Registration  in  the  presence  of  local  variation 

For  the  purpose  of  ISAR  image  focusing,  local  variations  may  be  presented  in  the  con¬ 
secutive  image  frames  due  to  the  target  motion  and  other  perturbations.  If  such  local 
variations  are  severe,  a  global  fransform  and  a  local  deformation  may  be  used  fogether 
fo  aid  the  Jensen-Renyi  divergence  to  identify  the  registration  point.  Even  though  in  the 
motion  compensation  stage,  the  local  transformation  is  not  required.  The  local  defor¬ 
mation  would  also  help  to  determine  the  maximal  integration  angle,  since  it  indicates 
the  projection  variation  of  targef  reflectivify  density  onto  the  imaging  plane.  Given  two 
ISAR  image  frames  /i  and  /2,  an  estimate  of  targef  motion  parameters  (f ,  0*,  7*)  is  then 
given  by 

(r,r,7*)  =  argmaxD^^^  {fi,giT(i,e,^)f2)) 

where  Djj^uj  (•)  is  an  induced  similarity  measure  based  on  Jensen-Renyi  divergence  of 
order  a  and  weighf  u.  s'  is  a  local  deformation.  A  good  candidate  for  such  a  local 
deformation  is  a  thin-plate  spline  [34].  The  thin-plate  spline  is  a  natural  interpolating 
function  for  dafa  in  fwo  dimensions  and  plays  a  similar  role  fo  the  natural  cubic  spline 
in  the  one  dimensional  case.  It  minimizes  the  total  bending  energy  of  all  possible  infer- 
polation  functions  [34].  Implementation  of  a  thin-plafe  spline  inferpolafion  is  sfraighf 
forward.  The  difficulfy  could  be  a  robusf  scheme  to  select  two  sets  of  feature  landmarks 
to  characterize  the  target  in  both  images.  These  two  sets  of  landmarks  are  necessary  to 
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calculate  the  parameters  in  a  thin-plate  spline  interpolation. 

•  Segmentation  techniques  to  construct  decision  maps  in  image  fusion 

Improving  the  segmentation  techniques  to  construct  a  decision  map  is  crucial  for  mul¬ 
tiscale  image  fusion.  I  would  suggest  applying  region  growing  methods  to  segment 
the  selection  map  in  the  future  research.  Region  growing  methods  are  well  suited  for 
the  selection  maps,  where  borders  are  difficult  to  detect.  Computation  cost  used  to  be  a 
problem,  however,  a  recently  developed  fast  watershed  algorithm  by  Vincent  and  Soille 
[99]  was  found  to  be  several  hundreds  of  times  faster  than  several  classical  algorithms. 
Image  segmented  by  region  growing  methods  sometimes  contain  either  too  many  re¬ 
gions  (under-growing)  or  too  few  regions  (over-growing)  as  a  result  of  non-optimal 
parameter  setting.  To  improve  classification  results,  a  variety  of  post-processors  [100] 
has  been  developed.  I  would  also  suggest  investigating  these  post-processors  to  fine 
tune  the  resulting  decision  map.  A  well  constructed  decision  map  could  reduce  the 
artifacts  introduced  by  switching  source  wavelet  coefficients  and  hence  improve  the 
quality  of  the  fusion  result. 
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